tact-lang / tact

Tact compiler main repository
https://tact-lang.org
MIT License
396 stars 112 forks source link

`asm`-blocks parser #837

Open anton-trunov opened 2 months ago

anton-trunov commented 2 months ago

The parser should produce ASTs for asm-blocks. This will ensure we are in control of error messages and also we can later implement a type-checker for the asm-blocks.

novusnota commented 2 months ago

TL;DR: Most of the work had to be scrapped as the somewhat reliable AST of TVM asm cannot be produced without having a hand-made parser, which would do typechecking and partial evaluation of the Fift-asm. Also see the UPD2.

Wrote a significant chunk of Ohm parser's code and parts of later stage hacks to support defining new words (instructions), including the active words, which affect the parsing in a context-sensitive way — the active words take the following word as an argument to them!

But then I realized, that:

  1. Shadowing is allowed (and occurs tacitly/silently), meaning that even built-in words can be re-defined while doing ANY parsing, so my updates to grammar.ohm itself are wasted on that. Moreover, its possible to define new words that affect the predictive parsing (the active words I described above).
  2. forget can remove the words. It can even forget forget!
  3. word, (word) and (word-prefix-find) which change the subsequent parsing depending on the previous entry on the stack — the first one (word) either uses it as a character to parse until, or consumes the word ahead. And mind you, they are used a lot in all of Fift's .fif lib, including Fift.fif and Asm.fif
  4. Adding to the 2nd point, there's a (create) that can create new words based on stack content, and (forget), which can remove words based on stack content.

A workaround for some of the issues above is the recursive-descent parser and a dictionary to keep track of. And it would require to keep its own stack, with type checking of items added to it and all that stuff. Otherwise we're left with just a slight expansion of possibilities of the current parser and a better recognition of built-in words in Fift-asm.

UPD: Thinking on prohibiting the forget word in the first place. And prohibiting shadowing the {, }, ({), (}), [ and ] words, for the sanity of the parser. The 3rd and 4th issues are the real blocker here — can't really parse if the stack is unknown.

UPD2: So, let's introduce just the minor update here and make very primitive ASTs, not attributing for types or anything, as they can get really incorrect considering the 3rd point above (even if we get rid of the first two by restricting the capabilities).