melt-umn / silver

An attribute grammar-based programming language for composable language extensions
http://melt.cs.umn.edu/silver/
GNU Lesser General Public License v3.0
57 stars 7 forks source link

Unparse preserving layout #800

Closed krame505 closed 9 months ago

krame505 commented 12 months ago

This is an idea stemming from a chat I had with @remexre today - it would be nice to be able to automatically unparse a concrete syntax tree, preserving layout. This would be useful in implementing refactoring tools that should preserve comments and whitespace. One could also implement pretty-printing via a transformation back from abstract to concrete syntax (although I'm not quite sure how it would work to add appropriate whitespace in an AST that didn't originate from concrete syntax.)

Implementing this would actually be fairly straightforward, using origin tracking. I think all we would need to add would be to record the start index in the list of parsed terminals on concrete productions and terminals constructed by Copper. Then given a concrete syntax tree and list of parsed TerminalDescriptors, when unparsing a term one could walk the origin info back to the concrete production that was originally parsed (only following redexes that have newlyConstructed=false, I think?) One can get the original layout for this production from the indices of its children in the terminal descriptors.

Since Silver doesn't distinguish concrete and abstract syntax (ha!) it would be easy to use this to build a tool for refactoring Silver code, allowing refactorings to be specified as rewrite rules. This would be very useful in some of the upcoming major refactorings from forthcoming breaking changes, so I'm tempted to actually take a stab at this. Doing so would require doing #517, but that is probably a worthwhile near-term project anyway. For the moment, extra layout to include in the unparse for a newly constructed term could could be specified with an origin note.

krame505 commented 12 months ago

@ericvanwyk pointed out that there is no need to actually get at the exact layout terminals, we just need the layout text between each production RHS item. We can get that by just slicing the originally-parsed string according to the start/end char indices of the locations from origin tracking. So we don't need to change anything to do with Copper code generation.