lcompilers / lpython

Python compiler
https://lpython.org/
Other
1.5k stars 157 forks source link

Move colons to the front of symbols in symbol table #1420

Open rebcabin opened 1 year ago

rebcabin commented 1 year ago

Consider

./src/bin/lpython -I/Users/brian/Documents/GitHub/lpython/src/runtime/ltypes/ltypes.py --show-asr --no-color ./examples/expr2.py   
(TranslationUnit (SymbolTable 1 {_lpython_main_program: (Function (SymbolTable 4 {}) _lpython_main_program [main0] [] [(SubroutineCall 1 main0 () [] ())] () Source Public Implementation () .false. .false. .false. .false. .false. [] [] .false.), main0: (Function (SymbolTable 2 {x: (Variable 2 x [] Local () () Default (Integer 4 []) Source Public Required .false.)}) main0 [] [] [(= (Var 2 x) (IntegerBinOp (IntegerBinOp (IntegerConstant 2 (Integer 4 [])) Add (IntegerConstant 3 (Integer 4 [])) (Integer 4 []) (IntegerConstant 5 (Integer 4 []))) Mul (IntegerConstant 5 (Integer 4 [])) (Integer 4 []) (IntegerConstant 25 (Integer 4 []))) ()) (Print () [(Var 2 x)] () ())] () Source Public Implementation () .false. .false. .false. .false. .false. [] [] .false.), main_program: (Program (SymbolTable 3 {}) main_program [] [(SubroutineCall 1 _lpython_main_program () [] ())])}) [])

I'm writing some back ends in Clojure. something like _lpython_main_program: is a syntax error in Clojure. If you move the colon to the front :_lpython_main_program, then, magically, ASR becomes valid Clojure, no post-processing needed! Otherwise I have to postprocess the output of --show-asr in non-robust string-hacking in Clojure.

Please consider permanently moving the colons to the front of symbols in --show-asr. You can change one line of code in PickleVisitorVisitor to do this forever.

rebcabin commented 1 year ago

Maybe I should do this on the Clojure side with regex search-replace. Is this the only use of colon in ASR? What about namespace syntax like "::" in C++? What about type syntax ":" in Haskell? I'm trying to avoid heavy parsing on the Clojure side, but if regex search-replace is going to be robust, I'll do it on the Clojure side.

rebcabin commented 1 year ago

I have to make an assumption about the lexemes for ASR "identifier." If they're like C, namely [_a-zA-Z][_a-zA-Z0-9]* then this isn't too bad. HOWEVER, it locks out Unicode in identifiers. I've been using Greek a lot in identifiers, and this postprocessing of mine is going to kill that.

image
rebcabin commented 1 year ago

Here is a better solution that admits Unicode:

image
rebcabin commented 1 year ago

I now believe I want to handle all issues like this on the Clojure side. I'll close the issue once one of you responds. IOW, I don't want to change PickleVisitorVisitor!

certik commented 1 year ago

I want to change our output to be canonical Clojure. The vast majority if all identifier are ascii, so no problem there.

But it can also be Unicode, so we need some solution there. Does Clojure support Unicode symbols?

It can also be special characters that we insert in AST->ASR, such as @. What is the list of acceptable symbol names in Clojure? We could escape non-conforming names.

rebcabin commented 1 year ago

Clojure names can be in Unicode. I use Greek to create symbols that won't collide with user symbols by convention.

rebcabin commented 1 year ago

https://clojure.org/reference/reader https://stackoverflow.com/questions/3902813/is-there-a-language-spec-for-clojure

no other written material. It's an observed fact that Unicode Greek is ok

https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/LispReader.java