servo / html5ever

High-performance browser-grade HTML5 parser
Other
2.13k stars 222 forks source link

Provide source spans for tokens and DOM nodes #48

Open kmcallister opened 10 years ago

kmcallister commented 10 years ago

We can use something like libsyntax's Span and Spanned types to track positions in the input stream.

The tokenizer will remember its current position and the position at certain events, e.g. start tag, start attribute name. The tree builder will call a tree sink method (with an empty default) to annotate the DOM with span information.

Then we can write a command-line HTML validator with the same output UI as rustc :)

Note that eventually it will be possible for a single document's nodes to come from multiple text sources, e.g. with document.write.

707090 commented 4 years ago

I am interested in this feature. What is the status of this? Is there some spanning information available or is this yet to be implemented?

My use case is that I am using html5ever in my proc macros to parse templates for a web framework and I would like to give errors with spans that point various parts of the original HTML

jdm commented 4 years ago

The tree builder sink is notified when the current line is updated via https://github.com/servo/html5ever/blob/36ee935f6884224d6b692cc2e8be0e4a308b8a6d/html5ever/src/tree_builder/mod.rs#L459.

noahbald commented 5 months ago

Are there any plans to add this to rcdom/markup5ever?