schierlm / BibleMultiConverter

Converter written in Java to convert between different Bible program formats
Other
126 stars 32 forks source link

USX to Validate - PrintSpecialVerseSummary #48

Closed Michahel closed 3 years ago

Michahel commented 3 years ago

I am converting USX to Validate, and I use PrintSpecialVerseSummary argument. When I run the converter, I get the following warnings:

...
Exception in thread "main" java.lang.RuntimeException: Validation error at Gen 1
:1
        at biblemulticonverter.data.FormattedText.validate(FormattedText.java:86
)
        at biblemulticonverter.data.Chapter.validate(Chapter.java:37)
        at biblemulticonverter.data.Book.validate(Book.java:35)
        at biblemulticonverter.data.Bible.validate(Bible.java:51)
        at biblemulticonverter.tools.Validate.doExport(Validate.java:130)
        at biblemulticonverter.Main.main(Main.java:67)
Caused by: java.lang.IllegalStateException: No whitespace allowed at end of elem
ent
        at biblemulticonverter.data.FormattedText$ValidatingVisitor.visitEnd(For
mattedText.java:950)
        at biblemulticonverter.data.FormattedText.accept(FormattedText.java:46)
        at biblemulticonverter.data.FormattedText$CSSFormatting.acceptThis(Forma
ttedText.java:279)
        at biblemulticonverter.data.FormattedText.accept(FormattedText.java:45)
        at biblemulticonverter.data.FormattedText$Footnote.acceptThis(FormattedT
ext.java:233)
        at biblemulticonverter.data.FormattedText.accept(FormattedText.java:45)
        at biblemulticonverter.data.FormattedText.validate(FormattedText.java:84
)
        ... 5 more
schierlm commented 3 years ago

I found some validation errors when importing from USX 2 and fixed them (used your example module from #42). If there are more left, feel free to share them in this issue.

You can skip all whitespace validation by passing -Dbiblemulticonverter.validate.ignore.whitespace=true at the beginning of the command line (i.e. before the -jar). However, this is not a fix, just a way to get past whitespace issues to find more pressing issues first (like printing the special verse summary).

Michahel commented 3 years ago

I don’t know which change is causing the error. Here's what happens now:

C:\PROGS\BibleMultiConverter>java -Dbiblemulticonverter.validate.ignore.whitespa
ce=true -jar BibleMultiConverter.jar USX N:\Bibles\CARS\Text Validate PrintSpeci
alVerseSummary
WARNING: Unsupported structured reference format at 1CH.usx line 12, column 255
- replaced by plain text: 1CH 11-9
WARNING: Unsupported structured reference format at 2CH.usx line 582, column 260
 - replaced by plain text: 1KI 17-2
WARNING: Unsupported structured reference format at LUK.usx line 69, column 864
- replaced by plain text: 1KI 17-2
WARNING: Unsupported structured reference format at MAL.usx line 91, column 586
- replaced by plain text: 1KI 17-2
WARNING: Unsupported structured reference format at MAT.usx line 603, column 183
1 - replaced by plain text: 1KI 17-2
WARNING: Unsupported structured reference format at MRK.usx line 360, column 366
 - replaced by plain text: 1KI 17-2
WARNING: Unsupported book abbreviation Нач., using Gen instead
WARNING: Unsupported book abbreviation Исх., using Exod instead
WARNING: Unsupported book abbreviation Лев., using Lev instead
WARNING: Unsupported book abbreviation Чис., using Num instead
WARNING: Unsupported book abbreviation Втор., using Deut instead
WARNING: Unsupported book abbreviation Иеш., using Josh instead
WARNING: Unsupported book abbreviation Суд., using Judg instead
WARNING: Unsupported book abbreviation Руфь, using Ruth instead
WARNING: Unsupported book abbreviation 1Цар., using 1Sam instead
WARNING: Unsupported book abbreviation 2Цар., using 2Sam instead
WARNING: Unsupported book abbreviation 3Цар., using 1Kgs instead
WARNING: Unsupported book abbreviation 4Цар., using 2Kgs instead
WARNING: Unsupported book abbreviation 1Лет., using 1Chr instead
WARNING: Unsupported book abbreviation 2Лет., using 2Chr instead
WARNING: Unsupported book abbreviation Узайр, using Ezra instead
WARNING: Unsupported book abbreviation Неем., using Neh instead
WARNING: Unsupported book abbreviation Есф., using Esth instead
WARNING: Unsupported book abbreviation Аюб, using Job instead
WARNING: Unsupported book abbreviation Заб., using Ps instead
WARNING: Unsupported book abbreviation Мудр., using Prov instead
WARNING: Unsupported book abbreviation Разм., using Eccl instead
WARNING: Unsupported book abbreviation Песн., using Song instead
WARNING: Unsupported book abbreviation Ис., using Isa instead
WARNING: Unsupported book abbreviation Иер., using Jer instead
WARNING: Unsupported book abbreviation Плач, using Lam instead
WARNING: Unsupported book abbreviation Езек., using Ezek instead
WARNING: Unsupported book abbreviation Дан., using Dan instead
WARNING: Unsupported book abbreviation Ос., using Hos instead
WARNING: Unsupported book abbreviation Иоиль, using Joel instead
WARNING: Unsupported book abbreviation Ам., using Amos instead
WARNING: Unsupported book abbreviation Авд., using Obad instead
WARNING: Unsupported book abbreviation Юнус, using Jonah instead
WARNING: Unsupported book abbreviation Мих., using Mic instead
WARNING: Unsupported book abbreviation Наум, using Nah instead
WARNING: Unsupported book abbreviation Авв., using Hab instead
WARNING: Unsupported book abbreviation Соф., using Zeph instead
WARNING: Unsupported book abbreviation Агг., using Hag instead
WARNING: Unsupported book abbreviation Зак., using Zech instead
WARNING: Unsupported book abbreviation Мал., using Mal instead
WARNING: Unsupported book abbreviation Мат., using Matt instead
WARNING: Unsupported book abbreviation Мк., using Mark instead
WARNING: Unsupported book abbreviation Лк., using Luke instead
WARNING: Unsupported book abbreviation Ин., using John instead
WARNING: Unsupported book abbreviation Деян., using Acts instead
WARNING: Unsupported book abbreviation Рим., using Rom instead
WARNING: Unsupported book abbreviation 1Кор., using 1Cor instead
WARNING: Unsupported book abbreviation 2Кор., using 2Cor instead
WARNING: Unsupported book abbreviation Гал., using Gal instead
WARNING: Unsupported book abbreviation Эф., using Eph instead
WARNING: Unsupported book abbreviation Флп., using Phil instead
WARNING: Unsupported book abbreviation Кол., using Col instead
WARNING: Unsupported book abbreviation 1Фес., using 1Thess instead
WARNING: Unsupported book abbreviation 2Фес., using 2Thess instead
WARNING: Unsupported book abbreviation 1Тим., using 1Tim instead
WARNING: Unsupported book abbreviation 2Тим., using 2Tim instead
WARNING: Unsupported book abbreviation Тит, using Titus instead
WARNING: Unsupported book abbreviation Флм., using Phlm instead
WARNING: Unsupported book abbreviation Евр., using Heb instead
WARNING: Unsupported book abbreviation Якуб, using Jas instead
WARNING: Unsupported book abbreviation 1Пет., using 1Pet instead
WARNING: Unsupported book abbreviation 2Пет., using 2Pet instead
WARNING: Unsupported book abbreviation 1Ин., using 1John instead
WARNING: Unsupported book abbreviation 2Ин., using 2John instead
WARNING: Unsupported book abbreviation 3Ин., using 3John instead
WARNING: Unsupported book abbreviation Иуда, using Jude instead
WARNING: Unsupported book abbreviation Отк., using Rev instead
Exception in thread "main" java.lang.NullPointerException
        at biblemulticonverter.format.paratext.ParatextBook$ParatextCharacterCon
tentContainer.accept(ParatextBook.java:613)
        at biblemulticonverter.format.paratext.AbstractParatextFormat$1.visitPar
atextCharacterContent(AbstractParatextFormat.java:231)
        at biblemulticonverter.format.paratext.ParatextCharacterContent.acceptTh
is(ParatextCharacterContent.java:35)
        at biblemulticonverter.format.paratext.ParatextBook.accept(ParatextBook.
java:116)
        at biblemulticonverter.format.paratext.AbstractParatextFormat.importPara
textBook(AbstractParatextFormat.java:129)
        at biblemulticonverter.format.paratext.AbstractParatextFormat.doImport(A
bstractParatextFormat.java:112)
        at biblemulticonverter.Main.main(Main.java:66)

C:\PROGS\BibleMultiConverter>
schierlm commented 3 years ago

Thank you for sharing the subset in #47. Was able to fix this issue and also that subset now validates perfectly.

In general when sharing modules under NDA, BibleMultiConverter has a ScrambledDiffable option that replaces all letters and digits by constants, but

  1. this module so far only covered Latin and Greek and let Cyrillic unscrambled
  2. It did not interact well with Paratext formats (first did the conversion to verse-based format, which already caused the error in your example)

I have now changed the scrambling to scramble all Unicode letters and digits regardless of script, and created a ScrambledParatextDump module that can do the same before converting Paratext to verse-based.

So in case you still have trouble, you can run

java -jar BibleMultiConverter.jar ParatextConverter USX N:\Bibles\CARS\Text ScrambledParatextDump dump.txt =23

(IMPORTANT If you leave out the ParatextConverter you will convert from Paratext to verse-based and back, causing different dump file which will likely not reproduce bugs)

Try if the bug appears as well if you use

... ParatextDump dump.txt ...

instead of

... USX N:\Bibles\CARS\Text ...

and if yes, you can share the created dump file without sharing any actual text (you can verify it should contain mostly X and x).

Michahel commented 3 years ago

... if yes, you can share the created dump file

Yes, the bug appears. dump.zip