schierlm / BibleMultiConverter

Converter written in Java to convert between different Bible program formats
Other
124 stars 33 forks source link

USX to OnLineBible - combined verses #45

Closed Michahel closed 3 years ago

Michahel commented 3 years ago

I am converting USX to OnLineBible Format, and USX file contains the combined verses.

  <para style="p">
    <verse number="10" style="v"/>FirstVerse <verse number="11-12" style="v"/>SecondVerse <verse number="13" style="v"/>ThirdVerse</para>

The SecondVerse should belong to verse 11 and should look like this in OnLineBible:

$$$ Ex 17:10
FirstVerse
$$$ Ex 17:11
\!(17:11-12)\! SecondVerse
$$$ Ex 17:13
ThirdVerse

When I run the converter, I get the following result, at the moment:

$$$ Ex 17:10
FirstVerse \!(17:11-12)\! SecondVerse
$$$ Ex 17:13
ThirdVerse
schierlm commented 3 years ago

I guess I need to revisit the heuristics for handling split, reordered and merged verses at some point. The current solution works "well enough" even for complex cases, but in some cases (like yours) it creates unexpected results. In general, it tends to merge verses that should be split more than the opposite way. On the other hand, it may even have to depend on the target format, as for exporting to BrowserBible the result of joining 11-12 to verse 10 creates a more useful result than splitting it. (Verse search will find the correct verse when searching for 11 or 12 in both cases, and the display of an extra bold (10) before the bold (10-11) only confuses the users).

If you want to know if there are any special or reordered verse numbers, you can export to Validate PrintSpecialVerseSummary, which will print all special verse numbers and their order to stdout.

Keep in mind that it is possible (and also happens) to have sequences like verse 1, 4a, 2, 3.5, 6, 4b, 7 and the heuristic should find a solution that is not too bad for them.

Currently there are two rulesets. Only for Logos export the "ranges" rules are used:

All other formats (that do not support ranges or reordered verse numbers) use the "normal" ruleset:

To give some examples, for 1, 4, 2, 3, 5, ranges would produce one verse containing 1,4,2,3, while normal would produce one verse 1,4 followed by second verse 2,3. And for the complex example above, 4a would get merged to 1, 3+5 to 2 and 4b to 6 in normal ruleset but in ranges ruleset all verses except 1 and 7 would get merged into verse 2.

My idea now would be to introduce a third ruleset (to be used by OnLineBible, MyBibleZone and maybe others, but not BrowserBible), that treats verses whose verse number contains more than one verse and the first one is a pure one (i.e the verse number matches /[1-9][0-9]*[.-/][1-9].*/ as if they were pure verse numbers with the first number. So 11-12 would be treated like 11 and put into verse 11 instead of merged with verse 10.

Or maybe I should expose the decision which ruleset to use to the user as (hidden) parameters? What do you think? Or do you have a better idea how to make a ruleset that works better than the current one, yet still handles complex cases well enough?

Michahel commented 3 years ago

Or maybe I should expose the decision which ruleset to use to the user as (hidden) parameters? What do you think?

I am hesitant to recommend such features that are hardly used. More features leads to more complexity. There is already a new IgnoreKJV argument. In fact, this option provides for the appearance of such a new ruleset, which we are now introducing. I think it is superfluous to add some other special parameter. I'm interested in how the BibleMultiConverter works with the IgnoreKJV argument, so I hope that the new ruleset will be applied with this very argument. So, for merged verses you've already come up with. For reordered verses, the verses must remain where they are. For split verses (number with letter), I need to study the LXX module to see which solution the module developers have already come to. To do this, I need to know all passages where such cases can be.

schierlm commented 3 years ago

Ah ok, I see that you also mentioned that this mode allows having one verse multiple times (and mapping them to different verses automatically). So you can also have reordered verses there? Good to know.

About LXX and split verses, Joshuah 24 contains quite a lot of them. Yet these are not reordered. But if you say it does not matter if you have reordered or duplicate verse numbers, it should not matter too much.

Basically I can think of two ways of doing it:

  1. Export each split verse as an individual verse

    $$ Jo 24:31 !(24:31a)! Text $$ Jo 24:31 !(24:31b)! Text

  2. Or add a special case that if the base verse number is the same as the last verse, merge them:

    $$ Jo 24:31 !(24:31a)! Text !(24:31b)! Text

Other options should be possible, too.

Michahel commented 3 years ago

So you can also have reordered verses there? Good to know.

Yes, that's right. The verses are transferred to those places that correspond to the verses in the KJV. It turns out that if the order in the original numbering is violated, then the user will not be able to restore it. You can only write about this somewhere, for example, in the information about the module.

Basically I can think of two ways of doing it: ...

The second way is better.

Michahel commented 3 years ago

I did some testing and found the following. If you do this:

$$$ Ex 17:10
FirstVerse
$$$ Ex 17:11-12
SecondVerse
$$$ Ex 17:13
ThirdVerse

And put the line in the RMP file:

Ex 17:11-12 Ex 17:11

When importing an EXP file, I get the result I want. The same is true for the case when letters are used in the numbering.

$$$ Jos 24:31a
Text
$$$ Jos 24:31b
Text

Put the lines in the RMP file:

Jos 24:31a Jos 24:31
Jos 24:31b Jos 24:31

Thus, the Online Bible will do all the work itself. The BibleMultiConverter should leave everything in its place and use those verse numbers as they are.