schierlm / BibleMultiConverter

Converter written in Java to convert between different Bible program formats
Other
126 stars 32 forks source link

TheWord Importer cannot import Strong numbers #3

Closed ichbindasauge closed 7 years ago

ichbindasauge commented 8 years ago

Hi,

I am trying to convert a Bible with Strong numbers from TheWord to Logos. I have successfully converted Bibles without Strongs or other tags. Do I need any special export argument for the Strong numbers? (btw, is there a list of possible arguments somewhere, can't find it through the help, on the Logos forum you mentioned StripGrammar - I am not sure how to use it, and it obviously would do the opposite of what I want).

When I convert to LogosHTML, the tags still look the same as in the TheWord file (like WH5921 but in brackets that I cannot enter here). After I saved the file in LibreWriter as .docx I used this command:

java -jar BibleMultiConverter-LogosEdition.jar LogosNestedHyperlinkPostprocessor inputfile.docx outputfile.docx

Below is the error message I get.

Thanks,

Bernhard

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source) at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source) at javax.xml.parsers.DocumentBuilder.parse(Unknown Source) at biblemulticonverter.logos.tools.LogosNestedHyperlinkPostprocessor.run(LogosNestedHyperlinkPostprocessor.java:90) at biblemulticonverter.Main.main(Main.java:53)

schierlm commented 8 years ago

Several comments:

1) Strong numbers can be stored in GBF style (e. g. <WG123>) in TheWord modules, however the importer currently cannot import them, mainly because they do not code the starting point, so when you have In the beginning<WH1234> it is not clear whether the 1234 refers to beginning or to In the beginning, for example. I might add some logic to do so in the future (using only the last word), but since none of my TheWord modules included Strong numbers so far, I did not pursue this any further. Probably it is also possible to first convert from TheWord to Diffable, and then use a text editor to search&replace the Strong numbers to the correct syntax, which might be good enough for one-shot conversions. If the bible is freely available, perhaps finding an OSIS or Zefania XML source that includes strongs might be the best option (since those formats encode both start and end of the reference).

2) The process looks ok for me, there are no special switches needed for leaving Strong numbers in the file (once they are imported :D). But I expect the strongs to show up as hyperlinks (white on white) or as plaintext hyperlinks linking to GreekStrongs) in the LogosHTML.

3) About the OutOfMemoryError: Can you tell me the size of the word/document.xml inside the .docx (which is actually a zip file)? Seems that the XML parser runs out of memory here. It may help to use a 64-bit Java version (if you are currently using 32-bit, see java -version if you are unsure) or increasing the memory limit to a few more gigabytes by giving -Xmx5G or similar (depending on how much RAM you have, leave some for the OS too) before the -jar switch in your command line.

In case of any more problems, feel free to ask. I'll leave this issue open to remind me that the TheWord importer needs some love wrt Strong numbers (if you can point me to some public domain modules that include Strongs, it might help to encourage me to try sooner :D)