teal-language / tl

The compiler for Teal, a typed dialect of Lua
MIT License
2.15k stars 110 forks source link

Pretty-printing tl source code for the purpose of refactoring/formatting #201

Open Ruin0x11 opened 4 years ago

Ruin0x11 commented 4 years ago

I was wondering if it would be possible to have an option to change how tl outputs its AST, such that you could preserve the formatting and comments of the original file.

Here is my use case: I have a very large Lua project (~190K SLoC at last count) that I want to refactor in various ways, like renaming the fields of tables or moving the location of a module and updating all the calls to require project-wide to point to the new location. It's difficult to do these things without using editor macros or writing specialized programs to parse and modify the source code.

My idea would be using a separate program that leverages tl to parse the code into an AST with the extra information like types, applying a modification to AST nodes that satisfy a predicate, and writing the tl or Lua code back to a string.

The issue is that refactoring code like this requires a specific property of the parser and formatter, namely that whitespace, comments, and all symbols and keywords are preserved. At the time I was wanting this none of the Lua parsers used by projects such as luacheck had this property, even though ones that did like lib2to3 existed for other languages, so I had to write my own, named kotaro, specialized for the purpose of source code refactoring. It sort of worked but the parser was buggy, it needed lots of tests and extra effort in order to be usable, and the performance wasn't that good from what I could accomplish by myself, and still some of the refactorings I wanted to do, like changing function signatures, would require a lot more effort to accomplish. But consolidating effort on tl's parser would be more beneficial and it also comes with type information, which would open up a lot more possibilities for refactoring. There are only a few things missing from tl to accomplish what I'm thinking of:

  1. Ability to pretty print the AST as the original tl source instead of just Lua.
  2. Option to preserve whitespace and comments when parsing/converting.
  3. Modification of the AST to include keywords like if, then and end and the whitespace/comments preceding them.

Also, this would be nice to have:

  1. A way of generating the dependency graph of a project directory (from what can be determined statically), to know what files need to be reparsed if a change is made.
  2. Exposing the tl parser as a drop-in module for lua and tl parsing, instead of needing to reach for luacheck, ldoc or similar just for its parser and gutting it out to hack on it. (#187)

Does this sound like a good idea?

hishamhm commented 4 years ago

Hi there, thank you for all the feedback!

Here's my feedback on the items above

Ability to pretty print the AST as the original tl source instead of just Lua.

This should be achievable through a new pretty-printer function, preferrably in a separate module once I split tl.tl in multiple modules with more stable APIs between them. I would prefer to keep the current code dump function more straightforward since it's used at require()-time.

Option to preserve whitespace and comments when parsing/converting.

This should be ok to add to the lexer as an option. I don't expect that to have a significant performance impact if kept optional, but I'm actually considering switching to (or adding an optional) lpeg-based lexer for compilation performance reasons.

Modification of the AST to include keywords like if, then and end and the whitespace/comments preceding them.

Then the ST is no longer A. ;) I've implemented a recursive descent parser for the compiler specifically for simplicity — we'd have to check how much code that would add, because this adds quite some maintenance burden and an extra guarantee of structure preservation that I don't currently need to care about when writing the compiler.

For such text-preserving tooling maybe a different parser would be a better fit, possibly something based on lpeg or tree-sitter.

A way of generating the dependency graph of a project directory (from what can be determined statically), to know what files need to be reparsed if a change is made.

This is something that would be useful for the tl CLI in general, for incremental recompilation. :+1:

Exposing the tl parser as a drop-in module for lua and tl parsing, instead of needing to reach for luacheck, ldoc or similar just for its parser and gutting it out to hack on it.

That would be nice indeed, but as said above, I'm not sure if the current incarnation of the parser is adequate for such one-size-fits-all-your-parsing-needs solution (though it is a good fit for a self-contained pure-Lua source-to-source compiler, which is a very specific thing). If there's to be a different parser made with more specialized tools, perhaps the generated AST could be made compatible (perhaps with extra annotations), but it's still too early to freeze the Teal AST format as the language design itself is very much in flux.

hishamhm commented 4 years ago

Update: I just saw #203... it does add a lot of code, and that's even without the formatting preservation. I'm not sure if I want to take the printer function in that direction, I'm afraid :confused: , but as mentioned above, this could be useful as an external module.