imsc-rosetta / imsc-rosetta-specification

The base specification and description of imsc-rosetta - a practical subtitle format based on IMSC
Other
5 stars 0 forks source link

XML parser limitations #1

Open btsimonh opened 2 years ago

btsimonh commented 2 years ago

For many XML parsers which create a JSON style structure, retention of the order of elements of different types is a challenge, resulting in a huge increase in coding complexity.

For TTML, the structure can be restricted such that only certain nodes need to retain order of different elements.

e.g. in <tt> and <head> , element order is specific for some elements we MUST use.

However, by excluding the use of <br>, the rest of the XML file has no reliance on the element order of differently named elements.

i.e. all elements only contain a single element type (which is normally represented as an ordered array of elements), or in the case of <div> where we propose to allow <metadata>, the order of <metadata> vs <p> is unimportant.

Q: is this true for all text representations including rubies?

btsimonh commented 2 years ago

Note - the use of <br/> is useful to enable multi-row-align in TTML.
Parser limitations to single element types to retain order can be overcome by always wrapping <br/> in it's own span - i.e. every br is represented as <span><br/></span>