Parsing into a concrete syntax tree.

modulovalue commented 2 years ago

Hello Lukas,

I was wondering what your opinion is on adding support for parsing xml documents into a concrete syntax tree i.e. an AST that contains location information such as the source location of equals signs in attributes and other syntactical elements.

Is this perhaps on your TODO list? if not, would you ever accept any PRs for that, even if it made use-cases that don't need a CST slower, or should a to-CST-parser exist independently of the to-AST-parser to not make the to-AST-parser slower?

renggli commented 2 years ago

The event based parser (which is now used for all parsing operations) has an option to keep track of source locations of individual events. While this does not allow to keep track of exact positions of each token, it gives enough information for things like error reporting, printing the location of elements, or even some basic syntax highlighting.

Speed and memory consumption are a general concern. There are users that want to be able to process GBs of XML data on mobile devices. So yeah, I am very careful of adding features that are not generally useful but that come at a high cost for everybody.

I played around propagating the location information from events to the DOM nodes, which is relatively easy to do in the current setup. However, I didn't pursue the idea further due to the lack of a strong user-case and the question what would happen with the location data if the DOM was mutated?

modulovalue commented 2 years ago

Thank you for the response. Given the requirement that GBs of data need to be parsed, forcing all users to parse into a CST sounds like it would definitely introduce an unacceptable amount of overhead.

renggli / dart-xml

Parsing into a concrete syntax tree. #148