Strong's numbers and Morphological tags in custom format

viktor-zhuromskyy commented 4 years ago

I am converting MyBible modules into TheWord format, and I need to have ability for the Multi Converter to accept and not through out my custom Strong's numbers and Morphological tags.

I have this type of Strongs: 3306, H3306, G3306, L3306 The H3306, G3306, L3306 are not accepted by the converter, at the moment.

I have custom type of Morphological tags.

When I run the converter, I get the following warnings:

WARNING: Invalid Strong number: L245
WARNING: Skipping malformed RMAC morphology code: N-N.MS
WARNING: Skipping malformed RMAC morphology code: R-PG.2S
WARNING: Skipping malformed RMAC morphology code: V-IFA.1P
WARNING: Skipping malformed RMAC morphology code: R-PA.MS
WARNING: Skipping malformed RMAC morphology code: R-PG.2S

Can you please make your code more flexible on treating "malformed" attribute types, please?

schierlm commented 4 years ago

Thank you for your report.

First, both H3306 and G3306 should be supported when importing from MyBible.Zone, but 3306 will be treated the same way as H3306 in old testament and G3306 in new testament. I have never seen the L ones, how should they be converted (to TheWord or other formats?) TheWord only supports <WGxxxx> and <WHxxxx> for Strongs, <WTxxxx> for morphology, but no <WLxxxx>.

For morphology, there already is a "morphology.raw" option, but unfortunately it is not supported by all module formats yet. it is supported by MyBible.Zone import, but not yet for TheWord export (although there is no particular reason for it, as in both formats you can use arbitrary strings for morphology tags). Other formats like Logos will obviously not support it, as they use their own way of encoding morphology which only works for morphology codes that follow the RMAC format.

So I'll take this as a feature request to 1) add support for raw morphology tags for TheWord format (at least export) 2) add some kind of "raw strong" support (which supports at least unlabeled, G, H, L) for at least MyBibleZone and TheWord fornat, while still unclear how to treat the "L" ones.

schierlm commented 4 years ago

I implemented a quick and dirty fix:

Strong numbers in MyBibleZone that start with L or S (which I also found somewhere) are treated like the equivalent G/H numbers, as if they had no letter prefix.
When importing using the -Dmybiblezone.morphology.raw=true option, the imported morphology can now also exported as TheWord format.

Can you please check if this solves your use case? If not, please clarify how you want these Strongs/Morphology tags treated.

In case you cannot compile from the repo and need a precompiled version, please drop me a short notice and I can send you one.

viktor-zhuromskyy commented 4 years ago

Thank you so much. Will check it later.

Can you please compile a release?

schierlm commented 4 years ago

Find attached a build of 4e2456e72a76b1e5a387eb5ba8bfc1336b3c1261: BibleMultiConverter-SQLiteEdition-4e2456e7.zip

viktor-zhuromskyy commented 4 years ago

Appreciate it so much!

viktor-zhuromskyy commented 4 years ago

I checked the build, but I cannot figure out how to add the morphology option in my commandline, as well as I an mot happy at all with you replacing the L and S prefixes to Strong's numbers. I want these to be preserved, since L prefixes to so called Strong's numbers are in reality the references to LXX dictionary. Can you please fix the substitution of L... to be output as L..., as well as NOT TO REPLACE S..., L... and G... numbers on Old Tertament books into H... since if I am converting Septuagint module, everything is screwed up, the greek Strong's being substituted to hebrew ones.

schierlm commented 4 years ago

1) you have to add the option before the -jar in your commandline, e.g. java -Dmybiblezone.morphology.raw=true -jar BibleMultiConverter.jar MyBibleZone 1.sqlite TheWord 1.ont.

2) yeah the replacing of Greek to Hebrew for LXX is definitely a bug, and it probably affects more formats. Will have to thoroughly test it.

2) Can you tell me how to encode the L numbers so that TheWord does not complain? Perhaps (if you have a matching Strong's dictionary) you can try changing in a text editor and check in TheWord how they need to be to work?

According to TheWord documentation, Strongs numbers have to look like <WGxxxx> or <WHxxxx>, and morphology tags <WTxxxx>. So do I have to make them <WGL1234> to conform to specification, or <WL1234> against specification (because the specification is wrong)?

If you don't want to try, I don't use TheWord so it will take me a while to set up a Windows VM and test it there myself.

viktor-zhuromskyy commented 4 years ago

Just leave the L as it is, cince those records need special dictionary. Sure, is gonna be a perfect fit.

Currently, I am doing text replacing in SQLite3 database before eporting to TheWord format, and after that doing text replacing to recreate <WT and <WG tags.

schierlm commented 4 years ago

Ok, so I will now leave the L (or S) prefix in, in addition to <WG or <WH. H and G prefixes will (as they should have done before) overwrite the <WG or <WG. If you want them differently, you can manually edit them.

Find attached a build of b557089945df6ac296b1dfb625a729fe634782a2: BibleMultiConverter-SQLiteEdition-b557089.zip

schierlm commented 4 years ago

Closing this for now. Feel free to reopen if anything else is open.

schierlm / BibleMultiConverter

Strong's numbers and Morphological tags in custom format #30