Closed davidg-sil closed 11 months ago
Well formatted USFM should start a new line before a paragraph style. Therefore, a line like: \h Title \p \v 1
Is not confirming to the standard very well.
However:
\h This
is a slightly odd header
Is standards-conforming.
Bear in mind that we also need to strip any trailing whitespace in these kinds of header markers, e.g. \toc.
On Thu, 12 Oct 2023, 10:48 davidg-sil, @.***> wrote:
Well formatted USFM should start a new line before a paragraph style. Therefore, a line like: \h Title \p \v 1 Is not confirming to the standard very well. However:
\h This is a slightly odd header
Is standards-conforming.
— Reply to this email directly, view it on GitHub https://github.com/sillsdev/ptx2pdf/issues/883#issuecomment-1759288313, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLMO3IWJYYIDUSN3HPOQELX664GBANCNFSM6AAAAAA2Z5CZ4I . You are receiving this because you are subscribed to this thread.Message ID: @.***>
\h
, \id
and \h1
now treat newline(s) as a space, multiple spaces as a space, and ignore trailing spaces, as per the USFM spec.
According to the spec, the fields should only contain text, so I've made it so that they end with anything starting with a backslash. Hopefully no one is using a zvar or character styling in such a location. I have some ideas how certain codes could be permitted, but they are even more gory than this code.
For future reference, my idea is strip the intial slash off the result of passing {\string#1} to another function, (or set \escapechar to -1) and use the result of that in a csname with suitable prefix/suffix as 'allowed in header'.
XeTeX at the moment treats
\h
as terminated by the new line. This causes reliable but hard-to-trace crashes if someone ends up with \h terminating with another paragraph style. The USFM standard looks like\h
should be treated as a normal paragraph style, and thus we are deviating from the standard. the XeTeX code needs \h's contents to be saved as a macro. Thus we have a bit of a conflict, but it would be nice to handle standard-conforming code a bit better.