Closed mhosken closed 3 months ago
Paragraphs have no end paragraph marker and are implicitly closed by the start of another paragraph or by a chapter milestone. (emphasis added)
Are there no paragraphs in the Bible that cross a chapter boundary? I think there might be. (I know that "sections" do.)
Also, paragraphs should also be closed by things like section headings. (Or do you regard them as paragraphs?)
Yes, there are certainly mid-paragraph chapters. The \nb
continuation marker exists to tell the typesetting engine that the verse text continues without a paragraph break.
Section heads, etc. are all paragraphs. The basic structure of a USX/USFM document is as a sequence of paragraphs of various kinds: section heads, main text, header fields, etc. There can also be milestones interspersed between paragraphs as well. But that's about it. It's actually pretty simple. Although seeing the wood for the trees can be tricky!
Implicit closure is nice. Given we always know what is a starting marker and we also know whether something is embedded, it is possible to implicitly close things. For example, the start of a new paragraph implicitly closes all character styles in the previous paragraph. Starting a new character style closes all open character styles including any currently open embedded character styles.
The difficulty is with parsing. Most parsers are based on some notion of recursive descent. This makes actual implicit closure hard and can turn run sequences into embedded runs. For example:
Is obviously invalid, since the \ft closes the \fr. But if we simply say that \fr and \ft are optional and use a typical recursive descent parser, then this example is usually valid and the \ft section is assumed to be embedded within the \fr. Adding support to invalidate this example takes a lot of work in a grammar. One has to say the end of a run is either the closing marker or the start of what might possibly come next. That 'what might possibly come next' can be a tricky and long list to come up with in each context.
Based on this, it is proposed to tighten the USFM specification to remove more implicit closure than has already been removed. The proposed rules are:
The astute reader will have caught the implication of rule 1. By explicitly closing character styles, the need for + type markers is removed. While it is planned to remove them (or at least treat them redundantly as equivalent to their non-plussed cousins), this change is not planned as part of the first phase of documenting the USFM standard as it stands.