oreillymedia / docbook2asciidoc

XSL for transforming DocBook to AsciiDoc
MIT License
61 stars 33 forks source link

Sanitize problematic chars in ids? #17

Open sandersk opened 12 years ago

sandersk commented 12 years ago

Having certain character sequences (e.g., double hyphens) in a DocBook id can cause issues in DB->ASC->DB roundtripping, because AsciiDoc will perform character substitutions on them.

For example, the id:

[[michael_and_debra_jean_in_an_over--id001]]

is going to be converted by asciidoc.py to the following in XREF linkends :

michael_and_debra_jean_in_an_over—id001

<xref linkend="michael_and_debra_jean_in_an_over&#8212;id001"/>

Technically, this is probably a problem better addressed by tweaking the AsciiDoc config for character substitutions in attributes, but it might also be nice to have handling in d2a.xsl to sanitize ids and linkends like this when generating the AsciiDoc so that the problem doesn't arise in the roundtrip back to DB.

sandersk commented 11 years ago

This problem also manifests when there are double underscores in ids, e.g.:

[[recipe__harvest_timeline]]

The corresponding XREF may be roundtripped to:

<xref linkend="recipe<emphasis>harvest_timeline"/>

If there is another XREF containing a linkend with a double-underscore in the same paragraph.