lcompilers / lpython

Python compiler
https://lpython.org/
Other
1.5k stars 158 forks source link

Proposed Nomenclature for the ASR Grammar #1447

Open rebcabin opened 1 year ago

rebcabin commented 1 year ago

The following parser in Clojure has survived several rounds of ASR updates. It contains a docstring proposing formal (machine-checkable) nomenclature for ASR:

(def asdl-grammar

  "Parse the ASDL spec for ASR into Clojure vectors and hashmaps,
  then generate clojure.specs for terms and forms from the vectors
  and hashmaps. Also promote these collections into forms
  convenient for interpreters (see asr.clj).

  The only things less-than-obvious in the following grammar are
  angle brackets. They mean \"don't bother reporting this term.\"
  Cuts down on clutter in the output.

  The grammar says:

  1. An ASDL spec is a bunch of ___productions___ or ASDL-DEFs.

  2. An ASDL-DEF, aka ___speclet___, is a triple of

    (a) ___term___ or ASDL-TERM

    (b) an equals sign

    (c) one or more (a _bunch_ of) ASDL-FORMs, ___forms___, for
  short, or _alternatives_.

  For example,

          production         (grammar lingo)
            speclet          (ASR lingo)
     __________^___________
    |                      |
      term        forms      (one or more)
     __^__      _____^_____
    |     |    |           |
     symbol  =  Program(...)
             |  Module(...)
             |  Functin(...)
                ...

  The left-hand side of (the equals sign in) a speclet is a term.
  The right-hand side of (the equals sign in) a speclet is an
  _alternation of forms_, just as one might say \"a murder of
  crows,\" \"a parliament of owls,\" or a \"lamentation of
  swans.\"

  There are 28 speclets in the constant ASR snapshot for testing,
  and 30 terms in a recent drop of live ASR. See all-terms-test in
  core_test.clj.

  3. There are three ___groups___ of speclets: ___composite
  speclets___, ___symconst speclets___, and ___tuple speclets___.
  These are not algebraic groups; they're just
  collections (see [this SO post for the difference between a
  _collection_ and a _sequence_ in
  Clojure](https://stackoverflow.com/questions/19850730/whats-the-difference-between-a-sequence-and-a-collection-in-clojure).

  In a recent drop of ASR, there are

    (a) 10 composite speclets -- expr, stmt, ttype, symbol, etc.

    (b) 6 tuple speclets -- call_arg, do_loop_head, dimension, etc.

    (c) 14 symconst speclets -- abi, cmpop, binop, presence, etc.

  4. A _composite speclet_ is an alternation of _composite forms_.
  Each composite form is an ASDL-HEAD or ___head___ followed by
  ASDL-ARGSs, really FORMAL PARAMETERS! [See ASR Issue 1446]
  (https://github.com/lcompilers/lpython/issues/1446).

  Example of one of the composite forms of the composite speclet
  \"symbol:\"

                          composite form
     ____________________________^____________________________
    |                                                         |
                     (same shape as a tuple form)
     head             parameters, parameter list
     __^__  ________________________^_________________________
    |     ||                                                  |
    Program(symbol_table s, ident nym, ident* deps, stmt* body)

  The \"symbol\" composite speclet has about 14 composite forms
  with heads Program, Module, Function, etc. The \"expr\"
  composite has about 86 heads: IfExp, ComplexConstructor,
  NamedExpr, etc.

  Parameters (ASDL-ARGS) of a composite form are identical in
  shape to a _tuple form_, e.g.,

    ...(symbol_table s, ident nym, ident* deps, stmt* body)

  5. The _symconst_ group contains symconst speclets like

                symconst speclet
     __________________^___________________
    |                                      |
     term           symconst forms
     _^_    _______________^_______________
    |   |  |                               |
    cmpop = Eq | NotEq | Lt | LtE | Gt | GtE

  where the _symconst forms_ are singleton identifiers, without
  parameter lists in round brackets. These forms are the _heads_
  of the symconst forms -- the symconst speclet above has _term_
  cmpop and its forms have _heads_ Eq, NotEq, etc.

  6. A tuple is a singleton form: a comma-separated list (in round
  brackets) of pairs of ___type___ and ___variable___, like this
  one:

                       tuple speclet
     ________________________^________________________
    |                                                 |
        term                   tuple form
     ____^____     _________________^_________________
    |         |   |                                   |
    array_index = (expr? left, expr? right, expr? step)

  identical in shape to the parameter list of a composite term.

  A tuple form is anonymous, but we gensym a _head_ for it,
  something like asr_234789, so that all of symconst, composite,
  and tuple forms all have heads.

  There are about 227 heads across all the groups, about six of
  which are heads of tuple forms with gensymmed heads. The tuple
  speclets are things like array_index, attribute_arg, alloc_arg,
  etc. See all-heads-test in core_test.clj.

  7. _parameter lists_ in composite forms and _tuple forms_ have
  the same shapes: pairs of ___types___ and identifiers. A _type_
  can have a _quantitative qualifier_, aka _multiplicity_: STAR or
  QUES, meaning that the variable denotes \"zero or more\" and
  \"at least one\" instance of the type, respectively. For
  instance, in the parameter list of the following composite form,

    Program(symbol_table s, ident nym, ident* deps, stmt* body)

  \"deps\" and \"body\" have STAR multiplicities.

  The default multiplicity, with no STAR or QUES, is \"exactly
  once.\"

  SUMMARY:

  The big picture to remember about _terms_ and _heads_ is that a
  _speclet_ looks like one of the following three:

    (a) `term` `=` `composite-`*`head`*`-1` `args`
               `|` `composite-`*`head`*`-2` `args`
               `|` ...

    (b) `term` `=` `symconst-`*`head`*
               `|` `symconst-`*`head`*
               `|`  ...

    (c) `term` `=` `tuple` (anonymous gensymmed *head*)

  A _term_ corresponds to a _bunch_ (one or more) _heads_, but
  each head corresponds to exactly one term.

  A term has exactly one _speclet_. The speclet is the term and
  the whole alternation of forms:, left-hand side (term) and
  right-hand side (alternation of forms), one form per head.

  All forms in a speclet are of the same group: composite,
  symconst, or tuple. There won't be a mixture of composites and
  symconsts, for instance, in the forms -- the right-hand side --
  of any speclet.

  In addition to the 3 groups, it turns out there are about 28
  terms and speclets, and 227 heads, forms, and clojure.specs when
  we're done. The number grows as we add features to ASR.
  "

  "MODULE         = SPC* <'module'> SPC* IDENT LBRACE SPEC RBRACE
   SPEC           = (ASDL-DEF SPC*)*
(* **************************************************************** *)
  <SPC>           = <#'\\s*(--[^\\n]*)?'> (* eat comments quickly * *)
  <BAR>           = <SPC* '|' SPC*>
  <COMMA>         = <SPC* ',' SPC*>
   STAR           = <SPC* '*' SPC*>
   QUES           = <SPC* '?' SPC*>
  <EQ>            = <SPC* '=' SPC*>
  <LBRACE>        = <SPC* '{' SPC*>
  <RBRACE>        = <SPC* '}' SPC*>
  <LPAR>          = <SPC* '(' SPC*>
  <RPAR>          = <SPC* ')' SPC*>
  <IDENT>         = #'[A-Za-z_][A-Za-z0-9_\\.\\-]*'
(* **************************************************************** *)
   ASDL-DEF       = ASDL-TERM EQ ASDL-FORMS  (* aka 'speclet' ***** *)
   ASDL-TERM      = IDENT
   ASDL-FORMS     = ASDL-FORM (BAR ASDL-FORM)*
  <ASDL-FORM>     = ASDL-COMPOSITE
                  | ASDL-SYMCONST
                  | ASDL-TUPLE

   ASDL-COMPOSITE = ASDL-HEAD ASDL-ARGS
   ASDL-SYMCONST  = IDENT
   ASDL-TUPLE     = ASDL-ARGS

   ASDL-HEAD      = IDENT
   ASDL-ARGS      = LPAR ASDL-DECL (COMMA ASDL-DECL)* RPAR
                  | LPAR RPAR
   ASDL-DECL      = ASDL-TYPE SPC* ASDL-NYM
   ASDL-TYPE      = IDENT | IDENT STAR | IDENT QUES
   ASDL-NYM       = IDENT
  ")
rebcabin commented 1 year ago

See https://github.com/rebcabin/ClojureProjects002/blob/master/asr/src/asr/grammar.clj