schierlm / BibleMultiConverter

Converter written in Java to convert between different Bible program formats
Other
126 stars 32 forks source link

OSIS export does not provide options to choose which elements to milestone (but always milestones verse elements) #8

Closed j2l closed 7 years ago

j2l commented 7 years ago

Hi,

First of all, thank you very much for this tool, a true open source bible tool is fresh air!

My first try from Zefania XML to OSIS produced the XML file, but format is incorrect, here are details:

Zefania XML format for verse is: <VERS vnumber="1">Au commencement, Dieu créa les cieux et la terre.</VERS>

Result from BibleMultiConverter is: <verse osisID="Gen.1.1" sID="Gen.1.1"/>Au commencement, Dieu créa les cieux et la terre.<verse eID="Gen.1.1"/> You can see that tags are self closing before and after verse, " is kept, and attributes are duplicated before and after. Result should be something like: <verse osisID='Gen.1.1'>Au commencement, Dieu créa les cieux et la terre.</verse>

Additionally, header "work/title" tag is written in the header, but following header work tags are written to "a new first div", in place of Genesis.

Is it possible to fix? Or to indicate where is the XSL Template to fix it? Thank you very much, God bless you!

schierlm commented 7 years ago

Hello Phil,

thank you for your feedback.

Actually, the fact that the verses are self closing and have a "sID", and there is a duplicate tag at the end with an "eID" is a feature of OSIS, called OSIS milestones: https://www.crosswire.org/wiki/OSIS_Bibles#OSIS_Milestones

According to the specification, there are several kinds of elements (chapter, verse, paragraph, line group, quote) that can be milestoned (but if a tag is milestoned, all occurrences of the same tag has to be milestoned).

The OSIS importer can (I believe) handle all cases of milestones, however the exporter is currently limited to a single format (Verses are milestoned, everything else is not).

The background for creating milestoned elements is that logical content (like verses) can span physical content (e. g. a verse can start in the middle of a paragraph and end in the middle of a line group, or a quote can start in the middle of one verse and end in the middle of another).

When converting from certain source formats (like Zefania XML or Haggai XML or TheWord), all these cases cannot happen though, as those formats do not support quotes and paragraphs need to be at the end of verses.

Therefore, I'll add an export option to the exporter to choose which tags you want to have milestoned (so you can choose if you prefer quotes or verses, or neither in case the bible does not have overlaps here).

In general case, removing milestones using XSL is impossible, as it may result in "non-valid XML". Therefore it is probably easier to add that option.

May I ask which program you try to import this OSIS file that claims to support OSIS but does not support milestoned verses? I know of a few programs that cannot import unmilestoned verses, but no one for the other way round.

For the suboptimal handling of metadata in some conversion directions (e. g. Zefania XML to OSIS) I've opened a new issue, #9, but I'm not planning to fix it immediately.

schierlm commented 7 years ago

Can you try the latest git version (compiled version attached)? When you pass "-" as second parameter to OSIS export, it should now create verse tags in the style you prefer.

BibleMultiConverter-0.0.5.3.zip

j2l commented 7 years ago

Wow! Fantastically fast, thank you Michael! Actually, I'm new to OSIS and wanted to convert a few Bibles I didn't find in claimed OSIS format, here, for instance Hindi, because I didn't find any Bible in XML on Crosswire website. Where are Bibles in OSIS format? I want JSON in the end (and BrowserBible doesn't output what I want), so the end is not near for me :) I'll try your update and let you know here. Thanks again.

j2l commented 7 years ago

It's working. For others stumbling on this topic, command is java -jar BibleMultiConverter.jar ZefaniaXML "SF_2015-08-16_HIN_HINERV_(EASY-TO-READ VERSION (HINDI ERV)).xml" OSIS test.xml - to get <verse osisID="Gen.1.1">आदि में परमेश्वर ने आकाश और पृथ्वी को बनाया।</verse>

Note that you have to check/change manually ALL book abbreviated names to generate valid osisID since this Zefania Bible doesn't provide correct abbreviations (nor correct book names).

schierlm commented 7 years ago

I am not aware of any large amounts of free Bibles available in OSIS format. OSIS has become the de facto standard for publishing Bibles commercially, due to the flexibility of the format. I tend to find free Bibles in Zefania XML instead, which is a simpler format in case you want to convert to JSON anyway.

Crosswire only publishes Bibles in their own binary SWORD format. Some of them were converted from OSIS, others from ThML, others from Zefania XML. But it does not matter at the end since BibleMultiConverter also has an import filter for SWORD bibles.

If you try to build a web site with Bible texts, perhaps have a look at http://biblewebapp.com/study/ (which is available on GitHub, and BibleMultiConverter also has an export filter for their internal format). But having other alternatives will be great too :)