Open certik opened 1 year ago
I am not sure we need both Var
and SymbolRef
. Perhaps one of the two will do in all cases, e.g., FunctionCall
, which has naked pairs like 2 pow 3 foo
, and ExplicitDeallocate
, which has an explicit vector of symrefs
, e.g., [2 x 3 y] >~~should-be~~> [(SymbolRef 2 x) (SymbolRef 3 y)]
.
If we need both, I prefer (Var 2 x)
and (SymbolRef 3 y)
to (Var (SymbolRef 2 x))
. The extra level of nesting seems superfluous and wasteful.
If we can get away with one of the two, why not do that?
Clojure will interpret (Var {2 x})
as a hash map with 2
as key and x
as identifier. It's OK, but I don't see why for such a small data item.
symbol
is currently used in quite a few nodes. If we want to use Var
, we need to replace symbol
with expr
, and enforce in verify()
that the expr is an actual Var
and no other expr. This is a limitation of ASDL.
We can rename Var
to SymbolRef
, but that I think does not change anything in the structure and by itself does not solve anything.
let's discuss this in person. My sense of clarity is decreasing, not increasing :)
On Mon, Feb 6, 2023 at 9:15 PM Ondřej Čertík @.***> wrote:
symbol is currently used in quite a few nodes. If we want to use Var, we need to replace symbol with expr, and enforce in verify() that the expr is an actual Var and no other expr. This is a limitation of ASDL.
We can rename Var to SymbolRef, but that I think does not change anything in the structure and by itself does not solve anything.
— Reply to this email directly, view it on GitHub https://github.com/lcompilers/lpython/issues/1510#issuecomment-1420214448, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABSRR4GRB3T2SPM2IWJJJDWWHLALANCNFSM6AAAAAAUTKKK34 . You are receiving this because you commented.Message ID: @.***>
Let me try for a little more clarity with some point-like questions:
2 x 3 y ()
-- with ()
representing a non-existent pair -- and it's not a pair to boot! -- are problematic for humans to read, programs to process, to parse, ASDL or EDN to specify, and so on. They're not even tupled! Shall we agree not to use them?[2 x 3 y nil nil]
are better, but still not great because we don't, prima-facie, know what they represent, and we sill have to count values inside the square brackets to check for even-ness and other possible constraints like "at-least-one" and "no more than three."(Var 2 x)
and [(Var 2 x) (Var 3 y) NullVar]
seem like good solutions to me. Now, the pairs are tagged and typed (with head Var
) and tupled (in round brackets, so we don't have to count and partition a stream of naked values) and we have an explicit bottom value for the Var
type.(SymbolRef 2 x)
and [(SymbolRef 2 x) (SymbolRef 3 y) NullSymbolRef]
.(Var [2 x])
, (Var (SymbolRef 2 x))
, (Var (2 x))
, (Var {2 x})
, and the like. What's the point?(Var 2 x)
and (SymbolRef 2 x)
. Except for the type-tag, Var
and SymbolRef
respectively, they have the same structure. What are they used for? Can Var
be used any particular places where SymbolRef
is used, and vice versa? How about all places, i.e., are Var
and SymbolRef
equivalent? If so, are they redundant? If not, what is the difference? Here is an example illustrating ambiguity with naked streams. These are two FunctionCalls from tests/expr7.py:
;; (FunctionCall
;; 1 \____ should be (SymbolRef 1 test_pow_1)
;; test_pow_1 /
;; () >---- should be NullSymbolRef (new symconst)
;; [((IntegerConstant 1 (Integer 4 []))) ; call_args
;; ((IntegerConstant 2 (Integer 4 [])))]
;; (Integer 4 [])
;; ()
;; ())
;; (FunctionCall _____________________
;; 2 \____ should be SymbolRef(...)
;; pow/__lpython_overloaded_0__pow __/
;; 2 \____ should be (SymbolRef 2 pow)
;; pow __/
;; [((IntegerConstant 2 (Integer 4 [])))
;; ((IntegerConstant 2 (Integer 4 [])))]
;; (Real 8 [])
;; (RealConstant 4.0 (Real 8 []))
;; ())
The (tricky, non-robust) code for processing (working-around) these examples is around line 476 in https://github.com/rebcabin/ClojureProjects002/commit/2b92b43454181933867694edfbf6daff3daa52d4#diff-1a0a9fc9fb7f9291d27c713a3b60f54e75050e723a463da5dd9df82a20a68080R456
Hey, is this issue available? Can I work on this issue?
??
@aadityasinha-dotcom you can work on any issue you want, simply submit a PR.
Currently a
symbol
(e.g. inVar(symbol v)
) is printed as2 x
, meaning it's the symbol table ID 2, variable "x" from that table. For example,(Var 2 x)
.This has issues with parsing where you have to parse two items for
symbol
instead of just one, and also things look inconsistent, is it a symbol (owning) or a pointer to a symbol (non-owning)?The best proposed idea how to fix this is to introduce:
And then the current
(Var 2 x)
becomes(Var (SymbolRef 2 x))
.See https://github.com/lcompilers/lpython/issues/1492#issuecomment-1414417966 for more details.
This cleans up the abstract representation of ASR, now
symbol
is always owning, andSymbolRef
is always non-owning.Important: we need to preserve the current speed, so in C++ the
SymbolRef
would actually always be replaced with just a pointer to thesymbol
, so the generated C++ code would be equivalent to what it is today, so no slow down. But conceptually, the pointer to symbolx
is the same as(SymbolRef 2 x)
. In languages that do not have pointers, such while printing or serializing (or Clojure), one can use(SymbolRef 2 x)
directly.Other ideas, just for printing:
(Var (2 x))
(Var {2 x})
But it seems our experience shows that it's better to use longer, descriptive names in ASR printouts and not to use shortcuts, so that it's more obvious what the printout represents to newcomers. The ASR Clojure like printout is NOT used for performance parts of the code. For that we use binary representation where we eventually implement all tricks possible to keep the binary representation as compact as possible, or alternatively to deserialize it as quickly as possible.
Related issues: