WIP: Use nom-locate to add span information in the AST

alixinne commented 4 years ago

The main target for this PR is issue #74. By using nom_locate we can rewrite the parsing functions to operate on input with span information added in. I'll post an update when the functionality is usable (for now I've just changed the parser and tests to use nom_locate::LocatedSpan).

This is flagged as WIP to discuss how to integrate this in the current architecture. One potential side-effect is allowing to parse comments (as requested in #116). We could have a way to associate "non-significant" spans (whitespace and comments) to the relevant syntax nodes in order to parse comments relative to the syntax tree, as suggested in https://www.oilshell.org/blog/2017/02/11.html.

This one is more of a stretch and it might be more sensible to be a second iteration on this feature, but this would allow preserving pre-processing directives in non-top-level contexts in the same way, which is currently broken (see issues #117 and #64).

hadronized commented 4 years ago

Thanks! I’ll try to dedicate some time to read this asap!

alixinne commented 4 years ago

In the last commit I decided to try implementing a ParseInput type. This allows passing in the LocatedSpan from nom_locate as well as a parsing context. The role of the parsing context is for now (to test this strategy) to record the comments as they are successfully parsed.

This requires quite a lot of changes since this introduces lifetimes everywhere, and the ParseContext structure is using a RefCell to allow the ParseInput type to be Copy using only a shared reference. This might not be the prettiest solution but it does work without having to rewrite the entire parser, and also since many of the nom combinators won't accept an FnMut which would appear if we held an exclusive reference to the parsing context.

Using a parsing context will allow replacing the Node structure with one that only holds an Option<NonZeroUSize> to reference a span by its identifier in the parsing context list of spans (not yet implemented). This will make the memory impact of these changes minimal, and also optional (for example, programmatically-generated syntax trees might not have span information so we can just use None and no context).

alixinne commented 3 years ago

Spans are now owned by nodes instead of the parsing context. This simplifies a big part of the code (and probably more when I finish refactoring). I think it's a better idea to have span information working first, and then to optimize how this is handled if it proves necessary.

To avoid having too many changes required on the parsing tests, there is now a NodeContentsEq trait with an associated assert_ceq! macro to compare syntax trees ignoring span information (PartialEq still compares everything in the tree). Its implementation is handled by the proc macro in glsl-impl.

alixinne commented 3 years ago

With the proc-macro in place we can also automatically generate type aliases for OldNodeType -> pub type OldNodeType = Node<RenamedOldNodeType> which allows integrating span information in the AST with minimal breakage to the API (the type names remain the same, except when explicitely matching on enum variants. This also avoids having Node<T> everywhere in the API. The into_node() method was also removed using this, so into() will convert from node data into a node with no span information.

hadronized / glsl

WIP: Use nom-locate to add span information in the AST #129