Open DavidHaslam opened 6 years ago
@dowens76 @DavidTroidl
Does nobody involved in this project take any notice of issues?
This was posted in December 2017 so what's going on?
Hi @DavidHaslam, I suspect many people agree with you on that, myself included. Making such a change in the text as it is now would certainly cause all sorts of backwards incompatibility issues.
I'd be in favor of offering an alternate version of the files in the repo that has the fields separated according to OSIS philosophy. If you want to put in PR with the changes as you suggest I think we'd be willing to incorporate it.
@jag3773
Since I added this issue in 2017, the website tanach.us has had a change of title.
There are other significant changes, but one relevant to this issue is that all the solidus /
markers that used to separate morphological segments have all been removed!
The general philosophy of OSIS is to use XML elements for all the semantic markup.
Using the solidus within the text to separate morpheme segments within Hebrew words goes against this OSIS philosophy. One friend has described this as "bad, bad, very bad".
cf. The XML files for the CrossWire WLC module are more conformant with this principle where they used the XML seg element for this purpose. The original data was obtained from the website tanach.us but further preprocessing was done before building the latest version of module, which differs from it's earliest version in this respect.
e.g. Taken from the mod2imp output of the CrossWire WLC module, they are generally like this:
NB. In this extract, the output was also converted to Word Per Line format afterwards.
Aside: That is not to say that the WLC module is perfect. Irrespective of any text critical issues, at least these mistakes were made when it was first built.
These are not your responsibility. I mention them merely in passing.
Those defects were rectified in the WLC module after I created this issue in 2017.