schierlm / BibleMultiConverter

Converter written in Java to convert between different Bible program formats
Other
124 stars 33 forks source link

USX to OnLineBible problem #41

Closed Michahel closed 3 years ago

Michahel commented 3 years ago

I have successfully compiled BibleMultiConverter from source. However, I experience problems converting from USX to OnLineBible. I am including the error messages that were displayed.

C:\PROGS\BibleMultiConverter>java -jar BibleMultiConverter.jar USX N:\Bibles\CAR
S\rus-CARS-b19fcf462065a794-rev8-2019-07-26-release\release\USX_1 OnLineBible N:
\Bibles\CARS\CARS.Exp
WARNING: Unsupported book abbreviation Нач., using Gen instead
WARNING: Unsupported book abbreviation Исх., using Exod instead
WARNING: Unsupported book abbreviation Лев., using Lev instead
WARNING: Unsupported book abbreviation Чис., using Num instead
WARNING: Unsupported book abbreviation Втор., using Deut instead
WARNING: Unsupported book abbreviation Иеш., using Josh instead
WARNING: Unsupported book abbreviation Суд., using Judg instead
WARNING: Unsupported book abbreviation Руфь, using Ruth instead
WARNING: Unsupported book abbreviation 1Цар., using 1Sam instead
WARNING: Unsupported book abbreviation 2Цар., using 2Sam instead
WARNING: Unsupported book abbreviation 3Цар., using 1Kgs instead
WARNING: Unsupported book abbreviation 4Цар., using 2Kgs instead
WARNING: Unsupported book abbreviation 1Лет., using 1Chr instead
WARNING: Unsupported book abbreviation 2Лет., using 2Chr instead
WARNING: Unsupported book abbreviation Узайр, using Ezra instead
WARNING: Unsupported book abbreviation Неем., using Neh instead
WARNING: Unsupported book abbreviation Есф., using Esth instead
WARNING: Unsupported book abbreviation Аюб, using Job instead
WARNING: Unsupported book abbreviation Заб., using Ps instead
WARNING: Unsupported book abbreviation Мудр., using Prov instead
WARNING: Unsupported book abbreviation Разм., using Eccl instead
WARNING: Unsupported book abbreviation Песн., using Song instead
WARNING: Unsupported book abbreviation Ис., using Isa instead
WARNING: Unsupported book abbreviation Иер., using Jer instead
WARNING: Unsupported book abbreviation Плач, using Lam instead
WARNING: Unsupported book abbreviation Езек., using Ezek instead
WARNING: Unsupported book abbreviation Дан., using Dan instead
WARNING: Unsupported book abbreviation Ос., using Hos instead
WARNING: Unsupported book abbreviation Иоиль, using Joel instead
WARNING: Unsupported book abbreviation Ам., using Amos instead
WARNING: Unsupported book abbreviation Авд., using Obad instead
WARNING: Unsupported book abbreviation Юнус, using Jonah instead
WARNING: Unsupported book abbreviation Мих., using Mic instead
WARNING: Unsupported book abbreviation Наум, using Nah instead
WARNING: Unsupported book abbreviation Авв., using Hab instead
WARNING: Unsupported book abbreviation Соф., using Zeph instead
WARNING: Unsupported book abbreviation Агг., using Hag instead
WARNING: Unsupported book abbreviation Зак., using Zech instead
WARNING: Unsupported book abbreviation Мал., using Mal instead
WARNING: Unsupported book abbreviation Мат., using Matt instead
WARNING: Unsupported book abbreviation Мк., using Mark instead
WARNING: Unsupported book abbreviation Лк., using Luke instead
WARNING: Unsupported book abbreviation Ин., using John instead
WARNING: Unsupported book abbreviation Деян., using Acts instead
WARNING: Unsupported book abbreviation Рим., using Rom instead
WARNING: Unsupported book abbreviation 1Кор., using 1Cor instead
WARNING: Unsupported book abbreviation 2Кор., using 2Cor instead
WARNING: Unsupported book abbreviation Гал., using Gal instead
WARNING: Unsupported book abbreviation Эф., using Eph instead
WARNING: Unsupported book abbreviation Флп., using Phil instead
WARNING: Unsupported book abbreviation Кол., using Col instead
WARNING: Unsupported book abbreviation 1Фес., using 1Thess instead
WARNING: Unsupported book abbreviation 2Фес., using 2Thess instead
WARNING: Unsupported book abbreviation 1Тим., using 1Tim instead
WARNING: Unsupported book abbreviation 2Тим., using 2Tim instead
WARNING: Unsupported book abbreviation Тит, using Titus instead
WARNING: Unsupported book abbreviation Флм., using Phlm instead
WARNING: Unsupported book abbreviation Евр., using Heb instead
WARNING: Unsupported book abbreviation Якуб, using Jas instead
WARNING: Unsupported book abbreviation 1Пет., using 1Pet instead
WARNING: Unsupported book abbreviation 2Пет., using 2Pet instead
WARNING: Unsupported book abbreviation 1Ин., using 1John instead
WARNING: Unsupported book abbreviation 2Ин., using 2John instead
WARNING: Unsupported book abbreviation 3Ин., using 3John instead
WARNING: Unsupported book abbreviation Иуда, using Jude instead
WARNING: Unsupported book abbreviation Отк., using Rev instead
Exception in thread "main" java.lang.NullPointerException
        at java.util.regex.Matcher.getTextLength(Unknown Source)
        at java.util.regex.Matcher.reset(Unknown Source)
        at java.util.regex.Matcher.<init>(Unknown Source)
        at java.util.regex.Pattern.matcher(Unknown Source)
        at biblemulticonverter.data.Utils.validateString(Utils.java:31)
        at biblemulticonverter.data.FormattedText$CrossReference.<init>(Formatte
dText.java:249)
        at biblemulticonverter.data.FormattedText$CrossReference.<init>(Formatte
dText.java:237)
        at biblemulticonverter.data.FormattedText$AppendVisitor.visitCrossRefere
nce(FormattedText.java:703)
        at biblemulticonverter.format.paratext.AbstractParatextFormat$ParatextIm
portVisitor.visitReference(AbstractParatextFormat.java:574)
        at biblemulticonverter.format.paratext.ParatextCharacterContent$Referenc
e.acceptThis(ParatextCharacterContent.java:475)
        at biblemulticonverter.format.paratext.ParatextBook$ParatextCharacterCon
tentContainer.accept(ParatextBook.java:555)
        at biblemulticonverter.format.paratext.ParatextCharacterContent$AutoClos
ingFormatting.acceptThis(ParatextCharacterContent.java:190)
        at biblemulticonverter.format.paratext.ParatextBook$ParatextCharacterCon
tentContainer.accept(ParatextBook.java:555)
        at biblemulticonverter.format.paratext.AbstractParatextFormat$1.visitPar
atextCharacterContent(AbstractParatextFormat.java:227)
        at biblemulticonverter.format.paratext.ParatextCharacterContent.acceptTh
is(ParatextCharacterContent.java:35)
        at biblemulticonverter.format.paratext.ParatextBook.accept(ParatextBook.
java:113)
        at biblemulticonverter.format.paratext.AbstractParatextFormat.importPara
textBook(AbstractParatextFormat.java:128)
        at biblemulticonverter.format.paratext.AbstractParatextFormat.doImport(A
bstractParatextFormat.java:112)
        at biblemulticonverter.Main.main(Main.java:66)

C:\PROGS\BibleMultiConverter>
schierlm commented 3 years ago

I believe I found your issue. That error could have been caused by USX files that contain <ref> tags referencing books that are not part of the Bible. Especially embarrassing as the first thing the OnLineBible exporter does is remove those references.

I pushed a fix to master branch that hopefully fixes that issue. Can you please test?

If compiling is troublesome for you, you can click the green check mark icon next to the commit ID (above), choose the build you like (Java 8 vs. Java 11), then click "details", then "artifacts" (upper right corner) and it will allow you to download the version compiled on GitHub's CI infrastructure.

If it does not solve the issue, would it be possible to share the file that cannot be converted?

Michahel commented 3 years ago

While processing one of the files, I received the following error message:

C:\PROGS\BibleMultiConverter>java -jar BibleMultiConverter.jar USX N:\Bibles\CAR
S\Text OnLineBible N:\Bibles\CARS\CARS.Exp
WARNING: Unsupported book abbreviation Нач., using Gen instead
WARNING: Unsupported book abbreviation Исх., using Exod instead
WARNING: Unsupported book abbreviation Лев., using Lev instead
WARNING: Unsupported book abbreviation Чис., using Num instead
WARNING: Unsupported book abbreviation Втор., using Deut instead
WARNING: Unsupported book abbreviation Иеш., using Josh instead
WARNING: Unsupported book abbreviation Суд., using Judg instead
WARNING: Unsupported book abbreviation Руфь, using Ruth instead
WARNING: Unsupported book abbreviation 1Цар., using 1Sam instead
WARNING: Unsupported book abbreviation 2Цар., using 2Sam instead
WARNING: Unsupported book abbreviation 3Цар., using 1Kgs instead
WARNING: Unsupported book abbreviation 4Цар., using 2Kgs instead
WARNING: Unsupported book abbreviation 1Лет., using 1Chr instead
WARNING: Unsupported book abbreviation Иуда, using Jude instead
WARNING: Ignoring unreferenced headlines
Exception in thread "main" java.lang.IllegalArgumentException: lastChapter is in
valid: 9
        at biblemulticonverter.data.Utils.validateNumber(Utils.java:26)
        at biblemulticonverter.data.FormattedText$CrossReference.<init>(Formatte
dText.java:250)
        at biblemulticonverter.data.FormattedText$CrossReference.<init>(Formatte
dText.java:237)
        at biblemulticonverter.data.FormattedText$AppendVisitor.visitCrossRefere
nce(FormattedText.java:703)
        at biblemulticonverter.format.paratext.AbstractParatextFormat$ParatextIm
portVisitor.visitReference(AbstractParatextFormat.java:590)
        at biblemulticonverter.format.paratext.ParatextCharacterContent$Referenc
e.acceptThis(ParatextCharacterContent.java:475)
        at biblemulticonverter.format.paratext.ParatextBook$ParatextCharacterCon
tentContainer.accept(ParatextBook.java:555)
        at biblemulticonverter.format.paratext.ParatextCharacterContent$AutoClos
ingFormatting.acceptThis(ParatextCharacterContent.java:190)
        at biblemulticonverter.format.paratext.ParatextBook$ParatextCharacterCon
tentContainer.accept(ParatextBook.java:555)
        at biblemulticonverter.format.paratext.AbstractParatextFormat$1.visitPar
atextCharacterContent(AbstractParatextFormat.java:227)
        at biblemulticonverter.format.paratext.ParatextCharacterContent.acceptTh
is(ParatextCharacterContent.java:35)
        at biblemulticonverter.format.paratext.ParatextBook.accept(ParatextBook.
java:113)
        at biblemulticonverter.format.paratext.AbstractParatextFormat.importPara
textBook(AbstractParatextFormat.java:128)
        at biblemulticonverter.format.paratext.AbstractParatextFormat.doImport(A
bstractParatextFormat.java:112)
        at biblemulticonverter.Main.main(Main.java:66)

The reason is in the following reference, which looks like this:

<ref loc="1CH 11-9">1 Лет. 11–2 Лет. 9</ref>

Here the reference is really wrong. It should be like this:

<ref loc="1CH 11:1-2CH 9:31">1 Лет. 11–2 Лет. 9</ref>

Is it possible to handle this case so that the logs contain a message like:

Invalid Cross Reference At - File:

and then an exact indication of the file name and the place in the text where this invalid reference is located?

schierlm commented 3 years ago

The error position would already have been printed if the invalid reference were detected by the USX import module. Unfortunately it was detected later. I moved the detection to the USX and USX3 import module so now you will