sillsdev / ptx2pdf

XeTeX based macro package for typesetting USFM formatted (Paratext output) scripture files
23 stars 8 forks source link

Can we make \h more like like a paragraph style? #883

Closed davidg-sil closed 11 months ago

davidg-sil commented 1 year ago

XeTeX at the moment treats \h as terminated by the new line. This causes reliable but hard-to-trace crashes if someone ends up with \h terminating with another paragraph style. The USFM standard looks like \h should be treated as a normal paragraph style, and thus we are deviating from the standard. the XeTeX code needs \h's contents to be saved as a macro. Thus we have a bit of a conflict, but it would be nice to handle standard-conforming code a bit better.

davidg-sil commented 11 months ago

Well formatted USFM should start a new line before a paragraph style. Therefore, a line like: \h Title \p \v 1 Is not confirming to the standard very well. However:

\h This
is a slightly odd header

Is standards-conforming.

mhosken commented 11 months ago

Bear in mind that we also need to strip any trailing whitespace in these kinds of header markers, e.g. \toc.

On Thu, 12 Oct 2023, 10:48 davidg-sil, @.***> wrote:

Well formatted USFM should start a new line before a paragraph style. Therefore, a line like: \h Title \p \v 1 Is not confirming to the standard very well. However:

\h This is a slightly odd header

Is standards-conforming.

— Reply to this email directly, view it on GitHub https://github.com/sillsdev/ptx2pdf/issues/883#issuecomment-1759288313, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLMO3IWJYYIDUSN3HPOQELX664GBANCNFSM6AAAAAA2Z5CZ4I . You are receiving this because you are subscribed to this thread.Message ID: @.***>

davidg-sil commented 11 months ago

\h, \id and \h1 now treat newline(s) as a space, multiple spaces as a space, and ignore trailing spaces, as per the USFM spec. According to the spec, the fields should only contain text, so I've made it so that they end with anything starting with a backslash. Hopefully no one is using a zvar or character styling in such a location. I have some ideas how certain codes could be permitted, but they are even more gory than this code.

For future reference, my idea is strip the intial slash off the result of passing {\string#1} to another function, (or set \escapechar to -1) and use the result of that in a csname with suitable prefix/suffix as 'allowed in header'.