lcompilers / lpython

Python compiler
https://lpython.org/
Other
1.51k stars 164 forks source link

ASR: change how a symbol is printed #1510

Open certik opened 1 year ago

certik commented 1 year ago

Currently a symbol (e.g. in Var(symbol v)) is printed as 2 x, meaning it's the symbol table ID 2, variable "x" from that table. For example, (Var 2 x).

This has issues with parsing where you have to parse two items for symbol instead of just one, and also things look inconsistent, is it a symbol (owning) or a pointer to a symbol (non-owning)?

The best proposed idea how to fix this is to introduce:

symref = SymbolRef(int symtab_id, identifier symnym)

expr =
    ...
    Var(symref v)

And then the current (Var 2 x) becomes (Var (SymbolRef 2 x)).

See https://github.com/lcompilers/lpython/issues/1492#issuecomment-1414417966 for more details.

This cleans up the abstract representation of ASR, now symbol is always owning, and SymbolRef is always non-owning.

Important: we need to preserve the current speed, so in C++ the SymbolRef would actually always be replaced with just a pointer to the symbol, so the generated C++ code would be equivalent to what it is today, so no slow down. But conceptually, the pointer to symbol x is the same as (SymbolRef 2 x). In languages that do not have pointers, such while printing or serializing (or Clojure), one can use (SymbolRef 2 x) directly.

Other ideas, just for printing:

But it seems our experience shows that it's better to use longer, descriptive names in ASR printouts and not to use shortcuts, so that it's more obvious what the printout represents to newcomers. The ASR Clojure like printout is NOT used for performance parts of the code. For that we use binary representation where we eventually implement all tricks possible to keep the binary representation as compact as possible, or alternatively to deserialize it as quickly as possible.

Related issues:

rebcabin commented 1 year ago

I am not sure we need both Var and SymbolRef. Perhaps one of the two will do in all cases, e.g., FunctionCall, which has naked pairs like 2 pow 3 foo, and ExplicitDeallocate, which has an explicit vector of symrefs, e.g., [2 x 3 y] >~~should-be~~> [(SymbolRef 2 x) (SymbolRef 3 y)].

If we need both, I prefer (Var 2 x) and (SymbolRef 3 y) to (Var (SymbolRef 2 x)). The extra level of nesting seems superfluous and wasteful.

If we can get away with one of the two, why not do that?

rebcabin commented 1 year ago

Clojure will interpret (Var {2 x}) as a hash map with 2 as key and x as identifier. It's OK, but I don't see why for such a small data item.

certik commented 1 year ago

symbol is currently used in quite a few nodes. If we want to use Var, we need to replace symbol with expr, and enforce in verify() that the expr is an actual Var and no other expr. This is a limitation of ASDL.

We can rename Var to SymbolRef, but that I think does not change anything in the structure and by itself does not solve anything.

rebcabin commented 1 year ago

let's discuss this in person. My sense of clarity is decreasing, not increasing :)

On Mon, Feb 6, 2023 at 9:15 PM Ondřej Čertík @.***> wrote:

symbol is currently used in quite a few nodes. If we want to use Var, we need to replace symbol with expr, and enforce in verify() that the expr is an actual Var and no other expr. This is a limitation of ASDL.

We can rename Var to SymbolRef, but that I think does not change anything in the structure and by itself does not solve anything.

— Reply to this email directly, view it on GitHub https://github.com/lcompilers/lpython/issues/1510#issuecomment-1420214448, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABSRR4GRB3T2SPM2IWJJJDWWHLALANCNFSM6AAAAAAUTKKK34 . You are receiving this because you commented.Message ID: @.***>

rebcabin commented 1 year ago

Let me try for a little more clarity with some point-like questions:

  1. naked streams of pairs like 2 x 3 y () -- with () representing a non-existent pair -- and it's not a pair to boot! -- are problematic for humans to read, programs to process, to parse, ASDL or EDN to specify, and so on. They're not even tupled! Shall we agree not to use them?
  2. streams of pairs in vector brackets, like [2 x 3 y nil nil] are better, but still not great because we don't, prima-facie, know what they represent, and we sill have to count values inside the square brackets to check for even-ness and other possible constraints like "at-least-one" and "no more than three."
  3. (Var 2 x) and [(Var 2 x) (Var 3 y) NullVar] seem like good solutions to me. Now, the pairs are tagged and typed (with head Var) and tupled (in round brackets, so we don't have to count and partition a stream of naked values) and we have an explicit bottom value for the Var type.
  4. Ditto (SymbolRef 2 x) and [(SymbolRef 2 x) (SymbolRef 3 y) NullSymbolRef].
  5. I don't see any good coming out of (Var [2 x]), (Var (SymbolRef 2 x)), (Var (2 x)), (Var {2 x}), and the like. What's the point?
  6. I don't understand the difference between (Var 2 x) and (SymbolRef 2 x). Except for the type-tag, Var and SymbolRef respectively, they have the same structure. What are they used for? Can Var be used any particular places where SymbolRef is used, and vice versa? How about all places, i.e., are Var and SymbolRef equivalent? If so, are they redundant? If not, what is the difference?
rebcabin commented 1 year ago

Here is an example illustrating ambiguity with naked streams. These are two FunctionCalls from tests/expr7.py:

;; (FunctionCall
;;  1            \____ should be (SymbolRef 1 test_pow_1)
;;  test_pow_1   /
;;  ()           >---- should be NullSymbolRef (new symconst)
;;  [((IntegerConstant 1 (Integer 4 [])))   ; call_args
;;   ((IntegerConstant 2 (Integer 4 [])))]
;;  (Integer 4 [])
;;  ()
;;  ())

;; (FunctionCall _____________________
;;  2                                 \____ should be SymbolRef(...)
;;  pow/__lpython_overloaded_0__pow __/
;;  2     \____ should be (SymbolRef 2 pow)
;;  pow __/
;;  [((IntegerConstant 2 (Integer 4 [])))
;;   ((IntegerConstant 2 (Integer 4 [])))]
;;  (Real 8 [])
;;  (RealConstant 4.0 (Real 8 []))
;;  ())

The (tricky, non-robust) code for processing (working-around) these examples is around line 476 in https://github.com/rebcabin/ClojureProjects002/commit/2b92b43454181933867694edfbf6daff3daa52d4#diff-1a0a9fc9fb7f9291d27c713a3b60f54e75050e723a463da5dd9df82a20a68080R456

aadityasinha-dotcom commented 1 year ago

Hey, is this issue available? Can I work on this issue?

aadityasinha-dotcom commented 1 year ago

??

certik commented 1 year ago

@aadityasinha-dotcom you can work on any issue you want, simply submit a PR.