Closed shadow-light closed 1 year ago
This is interesting. We use a custom XMLWriter to overwrite the significant whitespace rules (which are somehow odd in USX). I was not aware that this will automatically switch the character escape handler to DumbEscapeHandler, resulting in everything above U+0100 to be escaped.
It should be possible to get rid of this annoying behaviour, but I will have to have a closer look how exactly.
@Rolf-Smit: I assume you did not notice that behaviour when you did the USX revamp in #39?
Nighly build which includes this fix as well as #64: https://nightly.link/schierlm/BibleMultiConverter/workflows/main.yaml/master/BibleMultiConverter-AllInOneEdition-Release.zip
Hi thanks for this great converter. I'm converting USFM -> USX and noticed that it is producing XML entities instead of utf-8 characters even though the output encoding is utf-8.
Example:
\v 1 Iwamɨ́ó xwɨ́árí tɨ́nɨ aŋɨ́na tɨ́nɨ imɨxɨnɨŋíná eŋo nánɨ —Omɨ arɨ́á wirane negɨ́ sɨŋwɨ́ tɨ́ tɨ́nɨ wɨnɨrane sɨŋwɨ́ wɨnaxɨ́dɨrane wé tɨ́nɨ ɨ́á xɨrɨrane eŋwáorɨnɨ. Xwɨyɨ́á dɨŋɨ́ nɨyɨmɨŋɨ́ imónɨŋɨ́pɨ nánɨ neaíwapɨyiŋorɨnɨ.
<verse number="1" style="v" sid="1JN 1:1"/>Iwamɨ́ó xwɨ́árí tɨ́nɨ aŋɨ́na tɨ́nɨ imɨxɨnɨŋíná eŋo nánɨ —Omɨ arɨ́á wirane negɨ́ sɨŋwɨ́ tɨ́ tɨ́nɨ wɨnɨrane sɨŋwɨ́ wɨnaxɨ́dɨrane wé tɨ́nɨ ɨ́á xɨrɨrane eŋwáorɨnɨ. Xwɨyɨ́á dɨŋɨ́ nɨyɨmɨŋɨ́ imónɨŋɨ́pɨ nánɨ neaíwapɨyiŋorɨnɨ.<verse eid="1JN 1:1"/>
source
This is fine parsing wise, but it significantly increases file size, and I'm planning on serving them over network. Wondering if it's easy to disable this somehow?