schierlm / BibleMultiConverter

Converter written in Java to convert between different Bible program formats
Other
124 stars 33 forks source link

theWord import: <WT*> tags not implemented #78

Open paul1149 opened 1 year ago

paul1149 commented 1 year ago

Wondering if my command line is deficient. I've got a tagged NT that looks like this, (from line 1):

<GR>Βίβλος<Gr><EN> Book <En><WG976><WTN-NSF l="βίβλος">

Using BMC SQLite Edition 0.0.8, I use the command:

java -jar BMC.jar TheWord sblgnt.nt MyBibleZone sblgnt

The output on my Android phone, where MyBible is installed, is: <wt>Βίβλος<WG976><WTN-NSF l="βίβλος">

So something is wrong with how the tags are being processed. Can I correct this?

Thanks very much.

schierlm commented 1 year ago

Hello Paul,

thank you for this bug report.

Indeed, morphology import from TheWord is not implemented: https://github.com/schierlm/BibleMultiConverter/blob/cb7664b8a7e120bad6f076936e2ffd865b4616f3/biblemulticonverter/src/main/java/biblemulticonverter/format/TheWord.java#L253

When I implemented it in 2015, I ran into some problems and decided to first release without that support (I don't exactly remember which ones, but I assume it is caused by the intermediary format requires start and end markers for strongs and morphology, while TheWord only has the end markers). Then I must have forgotten it. The <Gr> and <En> tags are not implemented either :-(

Export of morphology tags to TheWord would work fine, though, as well as exporting to MyBible.Zone

I will keep this bug open to remind me that there is still an open issue.

To correct it yourself (apart from implementing the missing code if you know Java), I think the only viable option would be to first convert to Diffable format, then use some regular expressions to try to fix up the tags, and then export to MyBible.Zone. Or look if you can find your input file in a different format than TheWord.

[MyBible.Zone comes with SBLGNT, but from your description I assume that the module is more like MorphGNT, which includes morphology information. I am not aware of any MorphGNT modules for MyBible.Zone]

paul1149 commented 1 year ago

Thanks schierlm, I greatly appreciate that explanation. I didn't yet come up with a morphgnt module that I could convert to Mybiblezone. If that proves not possible, can you point me to what the required input format would look like? I might be able to regex it into compliance.

Thanks much. This is a great project!

schierlm commented 1 year ago

In Diffable format, a word tagged with both strong and morphology would look like

<grammar strong="976" rmac="N-NSF">Βίβλος</>.

I guess the format is pretty self-explanatory if you convert something to it and have a look yourself.

If you prefer some XML with schema, you can use RoundtripXML whose XSD you can find here: https://github.com/schierlm/BibleMultiConverter/blob/master/biblemulticonverter-schemas/src/main/resources/RoundtripXML.xsd

paul1149 commented 1 year ago

Thanks much. I will look into this!