Open rbeezer opened 2 years ago
Hi Rob, I've transferred this issue to the liblouisutdml repository because I think it's unlikely that this is a Liblouis issue.
Perhaps what would help to track down this issue is a (minimal) test with input HTML, configuration files (ini, cfg and sem files), translation tables and command line arguments.
Thanks, Bert. I forgot there are two repositories. :-( Of course, I should have been poking around in this one.
I'll dig a bit deeper, and as a last resort construct a minimal example.
It is not visible here, but there is a non-breaking space (U+00A0) that is output immediately after Contents
. So you will need to produce the output and examine the nature of the "extra" character.
Looks like the format centered
in style contentsheader
is to blame.
Minimal example attached.
Use
file2brl -f minimal.cfg source.html
Output is
,3t5ts
,f/ ,divi.n
#a
,f/ ,divi.n
,"s 3t5t4
#a
I'm getting what I think is a stray non-breaking space in BRF output.
I apply
file2brf
(Version 2.11.0) to an HTML file purpose-built for translation via this method.HTML contains
Semantic file contains
Output BRF has
,3t5ts
as the ToC header, where there is a single U+00A0 after the final "s" and before the newline. Clearly visible in my pager (
less
) and by other means.I looked through source but couldn't see where a change could be made to test, and a pull request formulated.
Thanks for any help you can provide, this is causiing me to use an incorrect encoding in a Python program that parses the BRF.
https://github.com/PreTeXtBook/pretext/blob/d402bdb3613d95984708150abe2fdb33123f565a/pretext/pretext.py#L2209