Closed alixinne closed 6 months ago
Thanks! I’ll try to dedicate some time to read this asap!
In the last commit I decided to try implementing a ParseInput
type. This allows passing in the LocatedSpan
from nom_locate
as well as a parsing context. The role of the parsing context is for now (to test this strategy) to record the comments as they are successfully parsed.
This requires quite a lot of changes since this introduces lifetimes everywhere, and the ParseContext
structure is using a RefCell
to allow the ParseInput
type to be Copy
using only a shared reference. This might not be the prettiest solution but it does work without having to rewrite the entire parser, and also since many of the nom combinators won't accept an FnMut
which would appear if we held an exclusive reference to the parsing context.
Using a parsing context will allow replacing the Node
structure with one that only holds an Option<NonZeroUSize>
to reference a span by its identifier in the parsing context list of spans (not yet implemented). This will make the memory impact of these changes minimal, and also optional (for example, programmatically-generated syntax trees might not have span information so we can just use None
and no context).
Spans are now owned by nodes instead of the parsing context. This simplifies a big part of the code (and probably more when I finish refactoring). I think it's a better idea to have span information working first, and then to optimize how this is handled if it proves necessary.
To avoid having too many changes required on the parsing tests, there is now a NodeContentsEq
trait with an associated assert_ceq!
macro to compare syntax trees ignoring span information (PartialEq
still compares everything in the tree). Its implementation is handled by the proc macro in glsl-impl
.
With the proc-macro in place we can also automatically generate type aliases for OldNodeType
-> pub type OldNodeType = Node<RenamedOldNodeType>
which allows integrating span information in the AST with minimal breakage to the API (the type names remain the same, except when explicitely matching on enum variants. This also avoids having Node<T>
everywhere in the API. The into_node()
method was also removed using this, so into()
will convert from node data into a node with no span information.
The main target for this PR is issue #74. By using nom_locate we can rewrite the parsing functions to operate on input with span information added in. I'll post an update when the functionality is usable (for now I've just changed the parser and tests to use
nom_locate::LocatedSpan
).This is flagged as WIP to discuss how to integrate this in the current architecture. One potential side-effect is allowing to parse comments (as requested in #116). We could have a way to associate "non-significant" spans (whitespace and comments) to the relevant syntax nodes in order to parse comments relative to the syntax tree, as suggested in https://www.oilshell.org/blog/2017/02/11.html.
This one is more of a stretch and it might be more sensible to be a second iteration on this feature, but this would allow preserving pre-processing directives in non-top-level contexts in the same way, which is currently broken (see issues #117 and #64).