lcompilers / lpython

Python compiler
https://lpython.org/
Other
1.5k stars 157 forks source link

ASR: `SubroutineCall.original_name` should be an `identifer?` not a `symbol?` ? #1480

Open rebcabin opened 1 year ago

rebcabin commented 1 year ago

The ASR spec for SubroutineCall is (modulo Issue #1479)

;; | SubroutineCall(symtab-id id, symbol name, symbol? original_name,
;;                  call_arg* args, expr? dt)

Should original_name be an identifer? (optional identifier) and not a symbol? (optional symbol)? Generally speaking, symbols are recursively evaluated, as in

symbol
    = Program(symbol_table symtab, identifier name, identifier* dependencies,
        stmt* body)
    | Module(symbol_table symtab, identifier name, identifier* dependencies,
        bool loaded_from_mod, bool intrinsic)
    | Function(symbol_table symtab, identifier name, identifier* dependencies, expr* args, stmt* body,
        expr? return_var, abi abi, access access, deftype deftype,
        string? bindc_name, bool elemental, bool pure, bool module, bool inline,
        bool static, ttype* type_params, symbol* restrictions, bool is_restriction,
        bool deterministic, bool side_effect_free)
    | GenericProcedure(symbol_table parent_symtab, identifier name,

etc., whereas identifiers (which are not spec'ced in ASDL) evaluate to themselves.

certik commented 1 year ago

It is, but for performance you want to have a direct link to the symbol, so that in the backend you don't need to do any kind of dictionary string lookups. So we use "symbol" for this purpose. You can think of it as a string and symbol table ID, but internally it is a direct non-owning pointer to the symbol.

rebcabin commented 1 year ago

I found this:

-- SubroutineCall/FunctionCall store the actual final resolved subroutine or
-- function (`name` member). They also store the original symbol
-- (`original_name`), which can be one of: null, GenericProcedure or
-- ExternalSymbol.

which implies that original_name is a bona fide symbol, because of

symbol
    = ...
    | GenericProcedure(symbol_table parent_symtab, identifier name,
    | ExternalSymbol(symbol_table parent_symtab, identifier name,
        symbol external, identifier module_name, identifier* scope_names,
        identifier original_name, access access)
    | ...

original_name can't be "any-old-symbol," but must be one of these two (or "null," another string not defined in the ASDL ?). The restriction of original_name is implicit "meta-semantics" that I must capture in processing code -- it's not expressed in the grammar. It might be better -- then again it might not be better -- to gather up these symbols (GenericProcedure and ExternalSymbol) into another composite term like

callable_origin = GenericProcedure(...)
                | ExternalSymbol

so that the restriction on the use of these symbols is explicit in the grammar. The downside is that it clutters up the grammar with an eleventh composite speclet. Explicit versus implicit usually brings trade-offs to the surface.

Generally speaking, I think it's questionable design for performance concerns to leak into this level of abstraction. However, it seems that there are semantical concerns that drive the decision, so bringing performance into this discussion is perhaps a red herring.

certik commented 1 year ago

If there is a way to improve the design without losing any current performance, then we should do it.