Closed GoogleCodeExporter closed 8 years ago
Another aspect of the issue is that some verse numbers are displayed in red.
They should not be.
Original comment by DFH...@gmail.com
on 10 Oct 2010 at 3:32
Original comment by DFH...@gmail.com
on 10 Oct 2010 at 3:33
Analysis:
The USFM source text files from ebible.org contain many superfluous extra
instances of the Words of Jesus end marker \wj*
\v 12 \wj Rejoice, and be exceeding glad: for great \add is\add*\wj*\wj your
reward in heaven: for so persecuted they the prophets which were before you.
\wj*
\p
\wj*
\v 13 \wj Ye are the salt of the earth: but if the salt have lost his savour,
wherewith shall it be salted? it is thenceforth good for nothing, but to be
cast out, and to be trodden under foot of men. \wj*
These are processed by Go Bible Creator in the normal way, causing the red
letter attribute to toggle incorrectly.
It looks as if the files from ebible.org may not be strictly compliant to the
USFM standard. One workaround might be to preprocess the files, to remove the
superfluous extra markers.
A solution might be for GoBibleCreator to ignore repeated occurrences of the
start marker \wj and repeated occurrences of the end marker \wj*.
Original comment by DFH...@gmail.com
on 10 Oct 2010 at 4:21
Counts for the pattern
\wj*
\p
\wj*
40-MAT-kjv.ptx 86 occurrences
41-MRK-kjv.ptx 37 occurrences
42-LUK-kjv.ptx 74 occurrences
43-JHN-kjv.ptx 27 occurrences
44-ACT-kjv.ptx 1 occurrence
...
66-REV-kjv.ptx 10 occurrences
This pattern may be an artefact of the software that Kahanapule Michael Johnson
has used to generate the USFM files from another format.
Original comment by DFH...@gmail.com
on 10 Oct 2010 at 4:28
Another redundant pattern is when the end marker is followed immediately by the
start marker. A search for "\wj*\wj " gives
40-MAT-kjv.ptx 241 matches
Also, the start marker \wj usually has two spaces after it, rather than a
single space. There are way too many double spaces in these files.
Original comment by DFH...@gmail.com
on 10 Oct 2010 at 4:50
Further analysis (First Gospel for illustration):
40-MAT-kjv.ptx
"\wj " 987 matches
"\wj*" 1079 matches
Thus there are 92 unpaired instances of the end marker.
Original comment by DFH...@gmail.com
on 10 Oct 2010 at 8:07
The GoBibleCreator issue is that it does not detect unpaired markers in USFM
for those that should always be paired. No error message is generated.
Original comment by DFH...@gmail.com
on 10 Oct 2010 at 8:08
Workaround implemented.
This morning, I developed a TextPipe Standard filter to remove the unpaired and
redundant wj markers in the source text files. This has fixed the display
problem.
As a follow up, I should email Michael Johnson to inform him of this issue.
Original comment by DFH...@gmail.com
on 11 Oct 2010 at 9:58
The "export to clipboard" TextPipe filter is in the attached file. (FIO)
The binary .fll file is available upon request.
Original comment by DFH...@gmail.com
on 11 Oct 2010 at 11:29
Attachments:
I have just emailed Michael Johnson to inform him of this issue.
Original comment by DFH...@gmail.com
on 11 Oct 2010 at 11:40
Closed this issue relating to one particular set of USFM files from ebible.org
Added new issue 136. See
http://code.google.com/p/gobible/issues/detail?id=136
Original comment by DFH...@gmail.com
on 11 Oct 2010 at 12:36
Original issue reported on code.google.com by
DFH...@gmail.com
on 10 Oct 2010 at 3:31Attachments: