teusbenschop / shona

The text of the Shona Bible for use by the translation team
10 stars 5 forks source link

Table of Contents markers #2

Closed DavidHaslam closed 7 years ago

DavidHaslam commented 7 years ago

Your attention is drawn to the following observations:

  1. None of the 66 books have \toc1 marker (now required by ParaTExt)
  2. Only 38 books have \toc2 marker (also required by ParaTExt)
  3. None of the 66 books have \toc3 marker (optional in ParaTExt)

Even though this work is being maintainted with Bibledit, it still makes good sense for it to match the requirements of ParaTExt.

Defining book abbreviations with \toc3 would help to make sense of cross-references.

teusbenschop commented 7 years ago

I see. Well, something can be improved then. I would have a question about the \toc1 marker, and the \toc2 marker: What are they supposed to define, and what's the USFM docs for that? I understand that the \toc3 markers define the book abbreviations used in the xrefs?

DavidHaslam commented 7 years ago
  1. \toc1 is the verbose book name
  2. \toc2 is the short book name
  3. \toc3 is the abbreviation

USFM 2.4 documentation is here

USFM 3.0 is maintained here

From the former (example - now somewhat dated)...

Implementing toc1 and toc2: The \toc1, \toc2, and \toc3 markers are provided to assist publishing tools in automating the generation of a table of contents. They can be included in the main text file for each scripture book, after \h. They are not for use directly within the front matter (FRT) peripheral file.

If you are working with the Publishing Assistant InDesign publishing path, the following markup is needed for generating an automatic table of contents: · Addition of \toc1 (and optionally \toc2) markers after \h, within each scripture book (\toc3 is not yet supported). · Addition of the "\periph Table of Contents" (USFM 2.0 = \toc) sub-group marker within the FRT peripheral file. Complete details of this process are provided in the linkWeb Publishing Assistant User Guide.

DavidHaslam commented 7 years ago

From the viewpoint of making a SWORD module, the helpful one would be \toc3 This will help in processing the localised cross-references in the OSIS XML. Better to specify the abbreviations where they can be easily found than having to reverse engineer them from the extracted cross-references.

teusbenschop commented 7 years ago

Thanks for the refresher.

Thinking further, a possible good mechanical check for bibledit would be to check on the presense of the \toc[1-3] markers...

DavidHaslam commented 7 years ago

Seems good to me as an improvement.

DavidHaslam commented 7 years ago

FYI. The attached Zip file contains a snapshot of the USFM tag statistics.

merged.usfm.tags.count.usfm.zip

teusbenschop commented 7 years ago

Thx man!

DavidHaslam commented 7 years ago

Method used:

  1. TextPipe filter to concatenate all the data files from the downloaded repo.
  2. TextPipe filter to extract and count all the USFM tags and append tag descriptions The latter is used so often that I added it to the Windows shell extensions on my PC.

I started to use TextPipe in 2001.

teusbenschop commented 7 years ago

\toc1 and \toc2 have been fixed today. Hopefully the \toc3 can be done on Monday, God willing.

teusbenschop commented 7 years ago

This has been fixed now and should be in the repository within 10 minutes from now.