Closed jelovirt closed 1 year ago
Problem with Markdown input is that it uses specialize.xsl
to handle specializing and that will consume location information.
Maybe the locations could be passed as some kind of custom attributes on the serialized XML content, the custom attributes copied over by the specialize.xsl and then try to re-parse the XML and issue SAX events without those attributes but with a locator...
@raducoravu That's what I thought of too. However, I'd prefer to get rid of the XSLT based specialization completely and just have a SAX filter for it. Concept and reference are pretty straight forward, but task is proving to be a bit more complex.
Reimplemented specialize.xsl
as SpecializeFilter
, a SAX filter that effectively does the same thing as the old XSLT. The whole specialization support was a mistake, but need to retain it because some people may rely on it.
Add support for
Locator
to track source character positions. Because this implementation primarily targets use with DITA-OT, the most important event location to track is the start element event. This will be used to generate@xtrf
debug attribute.The
Locator
allows accessing SAX events' location information, specifically where the event ends. That mean if we have XML fragmentThe locations of events will be:
1:1
topic
start element1:8
title
start element2:10
shortdesc
start element3:14
In XML this makes sense, because the parser will read the input stream and emit the event when the next token is encountered.
Because different Markdown structures don't always have similar start delimiters, mapping line and column location between Markdown and XML isn't completely straight forward. Comparing to previous XML example, if we have Markdown fragment
The locations of synthetic events will be:
1:1
topic
start element1:2
title
start element1:2
shortdesc
start element3:1
So effectively, the start element event from XML parser will report the next character after start element, Markdown parsing will report the first character of the content. It's as if the start element was there, but invisible.