Using USFM-TitleTag: \id when the id line contains more text than CODE

GoogleCodeExporter commented 8 years ago

The USFM manual states:

\id_<CODE>_(Name of file, Book name, Language, Last edited, Date etc.)
· File identification. 
· This is the initial USFM marker in any scripture text file. 
· CODE is normally the standard 3 letter UBS/SIL scripture book abbreviation. 

Some USFM files do have a lot of text after the <CODE> which makes it 
impractical to specifiy \id as the USFM-TitleTag. Example:
\id MAT 41MAT.WCL, Gobaith i Gymru — beibl.net, Arfon Jones, 19-X-2004

It should only be necessary to specify the 3 letter CODE as the Book: names, 
rather than the verbose contents of the identification line.

This ought to be tackled by improving the parsing of the USFM files by Go Bible 
Creator.

Original issue reported on code.google.com by DFH...@gmail.com on 29 Jun 2011 at 7:45

GoogleCodeExporter commented 8 years ago

I'm thinking of having

Book-Index-Id-Name: 40  MAT   Matthew

(tab-delimited if it makes you happy)

Of course, we could enumerate the books automatically if they already are in 
sequence and in order:

Book-Index-Id-Name: #   GEN   Genesis
Book-Index-Id-Name: #   EXO   Exodus
...

Original comment by daniel.s...@gmail.com on 14 Jul 2011 at 3:58

GoogleCodeExporter commented 8 years ago

Beware of book index numbers. Confusion reigns out there.

Some translators have 41 = MAT, others have 40 = MAT.

This issue is just to do with how the \id line is parsed.
Go Bible Creator should ignore anything after the second space delimiter.
i.e. When the file contains
\id MAT blah blah blah .....
and we specify the property
USFM-TitleTag: \id
then the corresponding line in the collection is allowed to be
Book: MAT

i.e. We ignore the "blah blah blah ....." on the \id line

Nothing extra than this fix needs to be altered to solve this issue.

Original comment by DFH...@gmail.com on 14 Jul 2011 at 9:22

Changed state: Accepted

GoogleCodeExporter commented 8 years ago

Under the current system, the collections file will specify a tag (e.g. \h) as 
the title tag. The name of the book is extracted from the title tag. The order 
of the book is then determined by the order of appearance of the name in the 
collections file.

I am not sure what's the improvement suggested here. Is it to automatically 
sort the books according to their \id tag? Which canon should it use by default 
then?

Original comment by daniel.s...@gmail.com on 16 Jul 2011 at 3:06

GoogleCodeExporter commented 8 years ago

This issue is nothing to do with book order or canon. It's purely a 
simplification for parsing line 1 of the USFM files.

It has merely to do with the fact that some translators have as line 1 the bare 
three letter book CODE, e.g.
\id MAT
and other translators have added loads of extra text after the book CODE
\id MAT rhubarb rhubarb rhubarb rhubarb 

I only used rhubarb as the generic because of The Goon Show.

When one specifies
USFM-TitleTag: \id
and the USFM files have verbose line 1, then Go Bible Creator current expects
Book: MAT rhubarb rhubarb rhubarb rhubarb 
whereas it would simplify the collections text file for it to cope with 
Book: MAT
and ignore the "rhubarb rhubarb rhubarb rhubarb" (or whatever).

David

Original comment by DFH...@gmail.com on 17 Jul 2011 at 9:54

GoogleCodeExporter commented 8 years ago

NB. Some USFM file collections we have received have missing \h tags in some 
files.

This means that we have to specify the \id tag for such collections.

Original comment by DFH...@gmail.com on 17 Jul 2011 at 12:18

GoogleCodeExporter commented 8 years ago

As a workaround, I could use a TextPipe filter to insert \r\n\\rem  after the 
ID <code> part of the \id tag.

i.e. Thus preprocessing all the USFM files.

Original comment by DFH...@gmail.com on 12 Aug 2012 at 2:15

GoogleCodeExporter commented 8 years ago

Any further thoughts on this issue?

Original comment by DFH...@gmail.com on 28 Dec 2012 at 10:30

GoogleCodeExporter commented 8 years ago

I'll add an exception in the code. When \id is used as the title tag, both the 
book code and the full text can be used for identification.

Original comment by daniel.s...@gmail.com on 28 Dec 2012 at 3:15

GoogleCodeExporter commented 8 years ago

Did you mean to write "EITHER the book code OR the full text" ?

Original comment by DFH...@gmail.com on 31 Dec 2012 at 2:11

GoogleCodeExporter commented 8 years ago

No debate!  You've begun (started) to work on this.  :)

Original comment by DFH...@gmail.com on 31 Dec 2012 at 2:50

Changed state: Started

GoogleCodeExporter commented 8 years ago

this should have been fixed in the latest SVN.

Original comment by daniel.s...@gmail.com on 31 Dec 2012 at 2:52

GoogleCodeExporter commented 8 years ago

Original comment by DFH...@gmail.com on 31 Dec 2012 at 3:08

Changed state: Fixed

lukeme / gobible

Using USFM-TitleTag: \id when the id line contains more text than CODE #156