lcompilers / lpython

Python compiler
https://lpython.org/
Other
1.5k stars 163 forks source link

ASR Bug: `ExplicitDeallocate.vars` is not a `symbol *` #1492

Open rebcabin opened 1 year ago

rebcabin commented 1 year ago

This tripped up my abstract-evaluator:

The ASR spec says:

;; | ExplicitDeallocate(symbol* vars)

but the reality is that vars is a vector of even length: [<stid> <symbol>, ...], where stid is an integer global symbol-table id:

image
certik commented 1 year ago

That is how symbol is printed in the Clojure-like output.

Similar issue to https://github.com/lcompilers/lpython/issues/1478

certik commented 1 year ago

A symbol (reference) could be printed as (2 a) instead of just 2 a.

How should Var be printed? It could be Var((2 a)), instead of the current 2 a. There is some inconsistency currently.

rebcabin commented 1 year ago

I strongly disagree with symbol's meaning different things depending on context, say a Function or Module, here, and an implicitly paired integer id and an identifier elsewhere. This complicates interpretation and test generation. My software must dispatch not only the head of symbol (like Function, Module, etc.) but on non-robust structural or implicit contextual cues.

I recommend a new top-level composite,

symref = SymbolRef(symtab_id id, identifier symnym)

(using nym to avoid the already too ambiguous name)

printing and reading as ( SymbolRef 2 x )

and evaluated by a new, singleton defmulti: eval_symref.

Statements like ExplicitDeallocate that take arguments like [2 x 2 y] should be specified to take one or more (a vector of) symref+ or symref* (instead of symbol*) and take and print arguments like [(SymbolRef 2 x) (SymbolRef 2 y)].

Emphatically, again, reading and writing symbols differently depending on context is a slippery slope: we should not give ourselves permission to sin with such ambiguity and looseness.

Details

I have a partially automated recursive-descent evaluator. To it, symbol is, as written in ASR.asdl, one of Program, Module, Function, etc. It recursively evaluates parameters based on their terms (symbol, stmt, expr, etc.) These methods can be automatically generated from the ASDL, but only if symbol means symbol and nothing else! :

(defmulti eval-symbol first)

Something similar for expressions:

(defmulti eval-expr first) ... (defmethod eval-expr 'IntegerConstant [[ ... ]] ...)
;; symbol  -- QUOTED from ASR.asdl
;;     = Program(symbol_table symtab, identifier name,
;;         identifier* dependencies, stmt* body)

(defmethod eval-symbol 'Program
  [[head    symtab    nym    dependencies    body
    :as program]]
  (fn [penv]
    {... 
     :symtab       ((eval-symbol symtab)       penv)   ; 'SymbolTable
     :nym          ((eval-node   nym)          penv)   ; identifier
     :dependencies ((eval-node   dependencies) penv)   ; identifier *
     :body         ((eval-stmts  body) penv)           ; stmt *
     :penv         penv
     }))

(defmethod eval-symbol 'SymbolTable
  [[head    integer-id    bindings
    :as symbol-table]]
  (fn [penv]
etc. ...
  ))

;; | Variable(symbol_table parent_symtab, identifier name,
;;   identifier* dependencies, intent intent,
;;   expr? symbolic_value, expr? value,
;;   storage_type storage, ttype type,
;;   abi abi, access access, presence presence, bool value_attr)

(defmethod eval-symbol 'Variable
  [[head     parent-symtab-id  nym     dependencies
    intent   symbolic-value    value   storage
    type-    abi               access  presence  value-attr 
    :as variable]]
  (fn [penv]
etc. ...
  ))

(defmethod eval-symbol 'Function
  [[head                       ; 'Function
  (fn [penv]
etc. ...
  ))

When another term says symbol, symbol? or symbol*, I expect to call (eval_symbol <foo>) appropriately. To work around this bug, (eval-symbol ...) cannot simply dispatch on the head, as above, but must non-robustly figure out from the structure whether the thing being passed in is actually a symbol or a symbol-ref.

rebcabin commented 1 year ago

I'm still debating the possibilities with myself ...

if SymbolRef is an expr, then its appearance in other ASR terms will be of type expr or expr* etc. That's less than ideal, because there is a hidden semantical restriction that the expr must be a SymbolRef.

If symbol_ref is a headless tuple, as in (symtab_id id, identifier symnym), then that begs unification with Var, leading to a clumsy print like ( Var (2 x) ).

Perhaps the best solution is for a new, top-level, singleton composite:

symref = SymbolRef(symtab_id id, identifier symnym)

and modifying

expr ... | Var(symtab_id id, identifier varnym) | ...

Later, if not overengineering, symref could grow alternatives:

symref = VariableRef(symtab_id id, identifier varnym)
       | FunctionRef(symtab_id, identifier funnym)
       | ModuleRef(symtab_id, identifier modnym)
... -- one for every symbol

but that feels like overengineering because it duplicates alternatives for symbol -- it's not DRY.

czgdp1807 commented 1 year ago

Note: I changed ExplicitDeallocate.vars to expr (https://github.com/lfortran/lfortran/pull/1188/commits/03e02742f73d0db7deb8bb0006de8c6a685f0a0a#diff-65d4f5aefdce4101f9c116614e2f04697eeb987fa16565c21070c0daf785c4e5) because it is required to compile FPM with LFortran. It would be applied to LPython soon. Thanks.