lcompilers / lpython

Python compiler
https://lpython.org/
Other
1.5k stars 158 forks source link

Pointers / links in ASDL #1167

Open certik opened 2 years ago

certik commented 2 years ago

@czgdp1807 had a great idea here how to improve GoTo / GoToTarget: https://github.com/lcompilers/lpython/pull/1163#issuecomment-1263052400

Change GoTo(int target_id, identifier name) to GoTo(stmt_t goto_target). Then one can do ASR::down_cast<ASR::GoToTarget>(goto_object->goto_target)->m_name. So it would look like this:

diff --git a/src/libasr/ASR.asdl b/src/libasr/ASR.asdl
index 2f120d29e..2e71a694e 100644
--- a/src/libasr/ASR.asdl
+++ b/src/libasr/ASR.asdl
@@ -169,10 +169,10 @@ stmt
         -- GoTo points to a GoToTarget with the corresponding target_id within
         -- the same procedure. We currently use `int` IDs to link GoTo with
         -- GoToTarget to avoid issues with serialization.
-    | GoTo(int target_id)
+    | GoTo(stmt goto_target)
         -- An empty statement, a target of zero or more GoTo statements
         -- the `id` is only unique within a procedure
-    | GoToTarget(int id)
+    | GoToTarget(int id, identifier name)
     | If(expr test, stmt* body, stmt* orelse)
     | IfArithmetic(expr test, int lt_label, int eq_label, int gt_label)
     | Print(expr? fmt, expr* values, expr? separator, expr? end)

Except that this would not work, as GoTo would contain a new (copy) of GoToTarget, not related to the actual target.

The reason is that we can't represent links/pointers in ASDL. For the symbol table we implemented pointers implicitly: the ASDL->C++ translation script recognizes heuristically when the value is needed, and when a pointer is needed.

The way out of this could be to create an addition to ASDL, that when _ptr is appended after a type, it will be a pointer to this type, not the value.

So the final change would look something like this:

diff --git a/src/libasr/ASR.asdl b/src/libasr/ASR.asdl
index 2f120d29e..e10350585 100644
--- a/src/libasr/ASR.asdl
+++ b/src/libasr/ASR.asdl
@@ -89,20 +89,20 @@ symbol
         access access, deftype deftype, string? bindc_name, bool elemental,
         bool pure, bool module, bool inline)
     | GenericProcedure(symbol_table parent_symtab, identifier name,
-        symbol* procs, access access)
+        symbol_ptr* procs, access access)
     | CustomOperator(symbol_table parent_symtab, identifier name,
-        symbol* procs, access access)
+        symbol_ptr* procs, access access)
     | ExternalSymbol(symbol_table parent_symtab, identifier name,
-        symbol external, identifier module_name, identifier* scope_names,
+        symbol_ptr external, identifier module_name, identifier* scope_names,
         identifier original_name, access access)
     | DerivedType(symbol_table symtab, identifier name, identifier* members,
-        abi abi, access access, symbol? parent)
+        abi abi, access access, symbol_ptr? parent)
     | Variable(symbol_table parent_symtab, identifier name, intent intent,
         expr? symbolic_value, expr? value, storage_type storage, ttype type,
         abi abi, access access, presence presence, bool value_attr)
     | ClassType(symbol_table symtab, identifier name, abi abi, access access)
     | ClassProcedure(symbol_table parent_symtab, identifier name, identifier
-        proc_name, symbol proc, abi abi)
+        proc_name, symbol_ptr proc, abi abi)
     | AssociateBlock(symbol_table symtab, identifier name, stmt* body)
     | Block(symbol_table symtab, identifier name, stmt* body)

@@ -158,9 +158,9 @@ stmt
     | Associate(expr target, expr value)
     | Cycle()
     -- deallocates if allocated otherwise throws a runtime error
-    | ExplicitDeallocate(symbol* vars)
+    | ExplicitDeallocate(symbol_ptr* vars)
     -- deallocates if allocated otherwise does nothing
-    | ImplicitDeallocate(symbol* vars)
+    | ImplicitDeallocate(symbol_ptr* vars)
     | DoConcurrentLoop(do_loop_head head, stmt* body)
     | DoLoop(do_loop_head head, stmt* body)
     | ErrorStop(expr? code)
@@ -169,10 +169,10 @@ stmt
         -- GoTo points to a GoToTarget with the corresponding target_id within
         -- the same procedure. We currently use `int` IDs to link GoTo with
         -- GoToTarget to avoid issues with serialization.
-    | GoTo(int target_id)
+    | GoTo(stmt_ptr goto_target)
         -- An empty statement, a target of zero or more GoTo statements
         -- the `id` is only unique within a procedure
-    | GoToTarget(int id)
+    | GoToTarget(int id, identifier name)
     | If(expr test, stmt* body, stmt* orelse)
     | IfArithmetic(expr test, int lt_label, int eq_label, int gt_label)
     | Print(expr? fmt, expr* values, expr? separator, expr? end)
@@ -193,15 +193,15 @@ stmt
     | Select(expr test, case_stmt* body, stmt* default)
     | Stop(expr? code)
     | Assert(expr test, expr? msg)
-    | SubroutineCall(symbol name, symbol? original_name, call_arg* args, expr? dt)
+    | SubroutineCall(symbol_ptr name, symbol_ptr? original_name, call_arg* args, expr? dt)
     | Where(expr test, stmt* body, stmt* orelse)
     | WhileLoop(expr test, stmt* body)
-    | Nullify(symbol* vars)
+    | Nullify(symbol_ptr* vars)
     | Flush(int label, expr unit, expr? err, expr? iomsg, expr? iostat)
     | ListAppend(expr a, expr ele)
-    | AssociateBlockCall(symbol m)
+    | AssociateBlockCall(symbol_ptr m)
     | CPtrToPointer(expr cptr, expr ptr, expr? shape)
-    | BlockCall(int label, symbol m)
+    | BlockCall(int label, symbol_ptr m)
     | SetInsert(expr a, expr ele)
     | SetRemove(expr a, expr ele)
     | ListInsert(expr a, expr pos, expr ele)
@@ -215,9 +215,9 @@ expr
         -- Such as: (x, y+z), (3.0, 2.0) generally not known at compile time
     | ComplexConstructor(expr re, expr im, ttype type, expr? value)
     | NamedExpr(expr target, expr value, ttype type)
-    | FunctionCall(symbol name, symbol? original_name,
+    | FunctionCall(symbol_ptr name, symbol_ptr? original_name,
             call_arg* args, ttype type, expr? value, expr? dt)
-    | DerivedTypeConstructor(symbol dt_sym, expr* args, ttype type, expr? value)
+    | DerivedTypeConstructor(symbol_ptr dt_sym, expr* args, ttype type, expr? value)
     | ImpliedDoLoop(expr* values, expr var, expr start, expr end,
                     expr? increment, ttype type, expr? value)
     | IntegerConstant(int n, ttype type)
@@ -263,7 +263,7 @@ expr
     | DictConstant(expr* keys, expr* values, ttype type)
     | DictLen(expr arg, ttype type, expr? value)

-    | Var(symbol v)
+    | Var(symbol_ptr v)

     | ArrayConstant(expr* args, ttype type)
     | ArrayItem(expr v, array_index* args, ttype type, expr? value)
@@ -277,7 +277,7 @@ expr
     | ArrayReshape(expr array, expr shape, ttype type, expr? value)

     | BitCast(expr source, expr mold, expr? size, ttype type, expr? value)
-    | DerivedRef(expr v, symbol m, ttype type, expr? value)
+    | DerivedRef(expr v, symbol_ptr m, ttype type, expr? value)
     | OverloadedCompare(expr left, cmpop op, expr right, ttype type, expr? value, expr overloaded)
     | OverloadedBinOp(expr left, binop op, expr right, ttype type, expr? value, expr overloaded)
     | Cast(expr arg, cast_kind kind, ttype type, expr? value)
@@ -325,8 +325,8 @@ ttype
     | Set(ttype type)
     | List(ttype type)
     | Tuple(ttype* type)
-    | Derived(symbol derived_type, dimension* dims)
-    | Class(symbol class_type, dimension* dims)
+    | Derived(symbol_ptr derived_type, dimension* dims)
+    | Class(symbol_ptr class_type, dimension* dims)
     | Dict(ttype key_type, ttype value_type)
     | Pointer(ttype type)
     | CPtr()
@@ -370,7 +370,7 @@ cast_kind

 dimension = (expr? start, expr? length)

-alloc_arg = (symbol a, dimension* dims)
+alloc_arg = (symbol_ptr a, dimension* dims)

 attribute = Attribute(identifier name, attribute_arg *args)
certik commented 2 years ago

There is one issue with pointers: they require special handling in serialization and in printing. Currently every symbol is actually a pointer, and it is printed with the symbol table ID and a name; and serialized with the same approach. Deserialization requires special handling of this too.

So the stmt_ptr would require special handling as well and does not seem worth it.

There are two ways out:

The second approach would look like this:

diff --git a/src/libasr/ASR.asdl b/src/libasr/ASR.asdl
index 2f120d29e..9817e9ad5 100644
--- a/src/libasr/ASR.asdl
+++ b/src/libasr/ASR.asdl
@@ -105,6 +105,7 @@ symbol
         proc_name, symbol proc, abi abi)
     | AssociateBlock(symbol_table symtab, identifier name, stmt* body)
     | Block(symbol_table symtab, identifier name, stmt* body)
+    | GoToLabel(symbol_table parent_symtab, identifier name)

 storage_type = Default | Save | Parameter | Allocatable
 access = Public | Private
@@ -169,10 +170,10 @@ stmt
         -- GoTo points to a GoToTarget with the corresponding target_id within
         -- the same procedure. We currently use `int` IDs to link GoTo with
         -- GoToTarget to avoid issues with serialization.
-    | GoTo(int target_id)
+    | GoTo(symbol label)
         -- An empty statement, a target of zero or more GoTo statements
         -- the `id` is only unique within a procedure
-    | GoToTarget(int id)
+    | GoToTarget(symbol label)
     | If(expr test, stmt* body, stmt* orelse)
     | IfArithmetic(expr test, int lt_label, int eq_label, int gt_label)
     | Print(expr? fmt, expr* values, expr? separator, expr? end)

I think that should work, but it seems it might be more complicated than the first approach.

certik commented 2 years ago

Here is how Wolfram does it: https://reference.wolfram.com/language/ref/Goto.html

certik commented 2 years ago

It seems that all things considered, the simplest way seems to be this diff, applied to the current main:

diff --git a/src/libasr/ASR.asdl b/src/libasr/ASR.asdl
index a7721a7a7..eec1e8bac 100644
--- a/src/libasr/ASR.asdl
+++ b/src/libasr/ASR.asdl
@@ -173,10 +173,10 @@ stmt
         -- GoTo points to a GoToTarget with the corresponding target_id within
         -- the same procedure. We currently use `int` IDs to link GoTo with
         -- GoToTarget to avoid issues with serialization.
-    | GoTo(int target_id, identifier name)
+    | GoTo(identifier target_id)
         -- An empty statement, a target of zero or more GoTo statements
         -- the `id` is only unique within a procedure
-    | GoToTarget(int id, identifier name)
+    | GoToTarget(identifier id)
     | If(expr test, stmt* body, stmt* orelse)
     | IfArithmetic(expr test, int lt_label, int eq_label, int gt_label)
     | Print(expr? fmt, expr* values, expr? separator, expr? end)

And just use the string id. Fortran would just convert the integer to a string. Python would use the string directly.

rebcabin commented 2 years ago

I'm rereading this old classic for ideas.

https://dspace.mit.edu/bitstream/handle/1721.1/5753/AIM-443.pdf?sequence=2&isAllowed=y

czgdp1807 commented 2 years ago

https://github.com/lcompilers/lpython/issues/1167#issuecomment-1264090265 makes sense to me and should be simple enough. Does LLVM allow using string IDs for labels? If so then using string IDs makes more sense.