servo / html5ever

High-performance browser-grade HTML5 parser
Other
2.09k stars 213 forks source link

Implement full XML serialization for nodes #368

Open jdm opened 5 years ago

jdm commented 5 years ago

The implementation in https://github.com/servo/html5ever/blob/master/xml5ever/src/serialize/mod.rs is fairly simplistic, and does not reflect the complexity described by https://w3c.github.io/DOM-Parsing/#dfn-xml-serialization.

Ygg01 commented 5 years ago

Ok, I'm down with doing this.

My biggest questions are. Should this also add fragment parsing (#271)? Also while digging through the code I found #122 that might be tangentially related.

Right now I'm implementing parts of XML parsing. I'll probably leave fragment parsing for another time and possibly see about fixing #122 at an even later date.

jdm commented 5 years ago

I believe #122 is part of the full serialization algorithm, yes

pshaughn commented 4 years ago

https://github.com/servo/servo/issues/24920 looks like it's because of a step in this, "Elements not in the HTML namespace containing no children, are serialized using the empty-element tag syntax (i.e., according to the XML EmptyElemTag production). "

Ygg01 commented 4 years ago

@jdm yeah. I'm back on this issue. After a long time, I finally have time off. However I notice few problems.

  1. There is no type Document or DocumentFragment.
  2. To fulfil the problematic parts. Namely to get skip end tag in step 14. I need to know if the node has children or not, which I can't get from serializer unless some things are changed.

What would be a preferred solution?

For 1) can see adding extra Node variants, like Node::Document/Node:Document_Fragment For 2) I assume that I either need to create another method in Serializer akin to start_elem e.g. write_elem(&mut self, name: QualName, attrs: AttrIter, leaf_node: bool).

jdm commented 4 years ago

Yeah, if we need to be able to represent more kinds of nodes then we should add those variants. As for question 2, modifying the Serializer trait to provide the information you require sounds reasonable. If we can pass an argument to start_elem instead of adding a new separate method that's only used by the XML serializer, that might be preferable.

ktfth commented 1 year ago

This subject was progressed or we can work on it, to be a good starting point? If not, can you suggest another good first issue @jdm, thank you by now.