jelovirt / org.lwdita

LwDITA parser for DITA-OT
http://lwdita.org/
Apache License 2.0
25 stars 19 forks source link

Add support for Locator #132

Closed jelovirt closed 1 year ago

jelovirt commented 1 year ago

Add support for Locator to track source character positions. Because this implementation primarily targets use with DITA-OT, the most important event location to track is the start element event. This will be used to generate @xtrf debug attribute.

The Locator allows accessing SAX events' location information, specifically where the event ends. That mean if we have XML fragment

   12345678901234567890
  ---------------------
1| <topic>
2|   <title>Title</title>
3|   <shortdesc>Desc</shortdesc>
4| </topic>
5| 

The locations of events will be:

In XML this makes sense, because the parser will read the input stream and emit the event when the next token is encountered.

Because different Markdown structures don't always have similar start delimiters, mapping line and column location between Markdown and XML isn't completely straight forward. Comparing to previous XML example, if we have Markdown fragment

   12345678901234567890
  ---------------------
1| # Title
2|
3| Desc 
4|

The locations of synthetic events will be:

So effectively, the start element event from XML parser will report the next character after start element, Markdown parsing will report the first character of the content. It's as if the start element was there, but invisible.

jelovirt commented 1 year ago

Problem with Markdown input is that it uses specialize.xsl to handle specializing and that will consume location information.

raducoravu commented 1 year ago

Maybe the locations could be passed as some kind of custom attributes on the serialized XML content, the custom attributes copied over by the specialize.xsl and then try to re-parse the XML and issue SAX events without those attributes but with a locator...

jelovirt commented 1 year ago

@raducoravu That's what I thought of too. However, I'd prefer to get rid of the XSLT based specialization completely and just have a SAX filter for it. Concept and reference are pretty straight forward, but task is proving to be a bit more complex.

jelovirt commented 1 year ago

Reimplemented specialize.xsl as SpecializeFilter, a SAX filter that effectively does the same thing as the old XSLT. The whole specialization support was a mistake, but need to retain it because some people may rely on it.