igordejanovic / rustemo

LR/GLR parser generator for Rust https://igordejanovic.github.io/rustemo/
Apache License 2.0
32 stars 3 forks source link

attach locations to parsed tokens #2

Closed jackdotink closed 1 week ago

jackdotink commented 11 months ago

After initial syntax parsing is complete, it's often desired to run lints, type checking, variables existing, etc on the AST. To create errors for these, tokens need to have attached locations.

igordejanovic commented 11 months ago

The default builder deduces AST types which you can tune manually. In this sense AST is a Rust value of the parsed textual content. To add locations to your AST types you can use context passed to each action. AST is not meant to keep additional information by default. If you need location and layout information you could take a look at generic tree builder.

jackdotink commented 11 months ago

I might be wrong here, but I don't see a way to throw an error for something like an unknown variable using an AST parsed from rustemo. The ability to do that is what I'm looking for.

igordejanovic commented 11 months ago

You can modify action for creating variable AST node to add context information. Than you can use that info to create errors. See how it is done in rustemo grammar language implementation by using a wrapper type ValLoc.

Please note that generated actions can be modified manually. Thus, you can tune your AST types however you like.

igordejanovic commented 11 months ago

Rustemo is implemented in itself so if you introduce error in the rustemo grammar the error that is reported is done in the similar way how you should do it for your language.

andrewbaxter commented 1 month ago

+1 for an option to attach source locations to the tree.

The generic tree parser in being generic loses some amount of type safety, requiring you to know more about the tree itself in order to sensically walk it.

Writing a custom builder for a decently large syntax is a huge amount of work and is 99% boilerplate already in the default builder.

In my case I'm using the grammar for something like syntax colorization, so I need to be able to retrieve the exact spans for nodes in a parse (i.e. can't discard anything). My current workaround is to name everything and reassemble the text, but that's basically noise in the grammar.

Edit: The workaround doesn't entirely work either - it seems like (at least for some terminals) the actual text information is discarded. Maybe this is with constant terminals? Of course, I can manually substitute in the original text, but this backwards conversion is another avenue for errors.

igordejanovic commented 1 month ago

I see your point. Maybe the best approach would be to make an option for generating support for location info in inferred AST types/actions.

igordejanovic commented 1 week ago

There is now a configuration option builder-loc-info which when used with the default builder wraps all tokens and struct type into ValLoc wrapper type which provides location information and can be dereferenced to the original value. This will be a part of the 0.7.0 release which will be published shortly.