openscriptures / morphhb

Open Scriptures Hebrew Bible
https://hb.openscriptures.org
Other
178 stars 64 forks source link

Out of date SWORD module OSHB dated 2013-10-11 #47

Closed DavidHaslam closed 6 years ago

DavidHaslam commented 6 years ago

As I see that there have been a signifcant number of commits to the repo since 2013-10-11, it would be certainly useful to SWORD users for the OSHB module to be updated.

Or maybe OSHB should be replaced by one with a slightly different module name, being careful to use the Obsoletes key in the .conf file if that route is taken.

If you can find time to do this, it would be much appreciated.

DavidHaslam commented 6 years ago

SWORD module OSMHB version 2.0.1 is dated 2015-02-27 and is in the CrossWire Attic repository! This seems to me to be the wrong way round! I think OSHB should have been moved to the Attic and OSMHB should be in CrossWire Main. It ought to be updated to include: Obsoletes=OSHB

DavidTroidl commented 6 years ago

Actually the OSHB obsoletes the OSMHB. I made an attempt to update the OSHB, but ran into a snag. I submitted a question to the SWORD devel mailing list, but received no response.

DavidHaslam commented 6 years ago

The dates seem the wrong way round for that.

DavidHaslam commented 6 years ago

Can you recall what the snag was?

Was it technical or simply a lack of response from the "modules team"?

I think we should try and remedy the situation.

It may have fallen through the cracks after Chris Little left CrossWire.

DavidHaslam commented 6 years ago

OK - I found the relevant thread in the Nabble mirror of sword-devel (which I find easier to search than the pipermail archives).

I responded, but nobody else did, so essentially your questions remained unaddressed back in January.

DavidHaslam commented 6 years ago

Here's the text of your message dated 29 January 2017:

I am in the process of updating the OSHB module. I have a valid OSIS file of the text, but I have a few questions about markup for effective display in SWORD.

  1. I have lemmas of the form lemma="c/d/776" and lemma="5921 a", where the c/d encodes the prefixes on the word, the numbers are Strong numbers and the 'space a' represents an 'augment' for the Open Scriptures Hebrew Lexicon, to tie in to Brown, Driver, Briggs. My thought was to do something like: lemma="pre:c/d strong:776" and lemma="strong:5921 aug:a". Would this give better presentation that we have currently? Or is there a better way?

  2. Some of the words have morphology attributes, like morph="HC/Td/Ncfsa". These are coded to the Hebrew Morphology Codes. If they need a prefix, I was thinking of something like morph="hmc:HC/Td/Ncfsa".

  3. The words themselves have 'morphological divisions', like וְ/הָ/אָ֗רֶץ. These are meant to separate the prefixes from the main word. We have used elements in the past, וְהָאָ֗רֶץ. Would this approach require around every word, or only those words that have actual division slashes?

This came up because I tried making the module directly from the OSIS, without prefixes. I found both the lemmas and morphs all jumbled together and superimposed over each other, under each word, when I tested the module in Xiphos. I would like to automate the process of preparing the text for SWORD, as much as possible, so future updates won't be so labor intensive.

Your help will be appreciated.

DavidHaslam commented 6 years ago

I think this may be a major programming task on at least two fronts:

  1. Adapting the SWORD API (& JSword too) to handle new markup features particular to OSHB.
  2. Converting your present OSIS markup to something more amenable to how SWORD works.

I can possibly provide some assistance with the second process. The first is outside my remit and experience.

DavidHaslam commented 6 years ago

NB. Strong's numbers for Hebrew words should retain the H prefix. cf. The G prefix is used for Greek words in the NT and/or LXX.

DavidHaslam commented 6 years ago

Rather than writing a Python or Perl script to convert the existing OSIS XML file to a format more suitable as input for osis2mod, my inclination would be to develop a bespoke TextPipe filter for this conversion task.

I've been using TextPipe Standard for multiple tasks since 2001. I woudn't be without it.

DavidHaslam commented 6 years ago

One question at the forefront of my mind is exactly how morph segmentation should be handled by front-ends?

The existing .conf file for module OSHB includes the line:

GlobalOptionFilter=OSISMorphSegmentation 

For OSIS texts having morphological segmentation elements.

NB. Module OSMHB does not have this line, yet the source text does have seg elements.

I think one thing is clear at least. Correct me if I'm wrong. The ad hoc use of the solidus / as the segment delimiter is not yet suitable for a SWORD module. AFAIK, there is currently no SWORD filter that hides or shows the / according to the state of a suitable front-end UI toggle switch under module options.

DavidHaslam commented 6 years ago

Aside: The SWORD utility diatheke does not yet have a Command Line option switch to handle OSISMorphSegmentation.

cf. It has a switch -m for modules having either ThMLMorph or OSISMorph.

AFAIK, this has never been pointed out to sword-devel before.

DavidHaslam commented 6 years ago

If SWORD were to be enhanced to support the toggling the display of the solidus in suitably marked modules, would this obviate the need to use seg elements for this function?

NB. You'd still be OK having them with a type attribute, as per this counted list of such:

00004   <seg type="x-large">
42578   <seg type="x-maqqef">
02278   <seg type="x-paseq">
01181   <seg type="x-pe">
00009   <seg type="x-reversednun">
01981   <seg type="x-samekh">
00003   <seg type="x-small">
23191   <seg type="x-sof-pasuq">
00004   <seg type="x-suspended">

Even so, if the existing method of showing how word segments are encoded works OK, I don't see a pressing need to implement a method that works on something in the text rather than in the XML markup.

DavidTroidl commented 6 years ago

Thanks for all your research. I have just made a working module. I simplified the lemmas, to use just the Strong numbers, since that is all SWORD will look at for now anyway. I have also resolved the seg element issue.

The question about the morphology attributes still remains, but the main requirement, at this point, is how to get the Hebrew Morphology Codes into a format SWORD can use. Then the codes could be interpreted, similar to Robinson codes in other modules.

DavidHaslam commented 6 years ago

For modules that have

GlobalOptionFilter=OSISMorphSegmentation

although Xiphos 4.0.6a has a corresponding module option switch, it seems to have no visible effect on the displayed text.

What would you expect/require a font-end app to do with this feature?

Do we even understand how SWORD is programmed for it? Is it inadequately documented in the developers' wiki?

DavidHaslam commented 6 years ago

But to return to the main question, this is covered in the wiki.

See Marking morphology.

There's no section in this page about marking morpheme segmentation. btw. That terminology is the name of the Xiphos module option.

We ought to insert a section and pick the brains of the developers.

DavidHaslam commented 6 years ago

And while we're at it, we should propose a requirements specification for how SWORD should handle your n attributes for the disjunctive accents, or something equivalent should the developers wish to deprecate that use for the n attribute.

See also issue #46

DavidTroidl commented 6 years ago

The Marking morphology topic refers to marking up morph attributes, based on an existing lexicon module. It says:

Currently, SWORD offers lexicon modules named Robinson and Packard, both for Greek morphology.

I would need to know the format for such a module, to construct one for the Hebrew Morphology Codes.

DavidHaslam commented 6 years ago

Just added an incomplete subsection for marking morpheme segmentation to the OSIS Bibles page.

Just added a section to the OSIS Bibles talk page.

Just added a section to the diatheke talk page.

DavidHaslam commented 6 years ago

The source text provenance isn't evident, but there's a Hebrew module hboWLCeb in the eBible.org repo that already includes some morphology markup.

The attached text file is a counted list of morph attribute values extracted from the mod2imp output.

hboWLCeb.raw.imp.hbo.morph.count.txt

Here's an example:

$$$Genesis 1:1
<w lemma="strong:H7225" morph="HR/Ncfsa">בְּרֵאשִׁ֖ית</w> <w lemma="strong:H1254" morph="HVqp3ms">בָּרָ֣א</w> <w lemma="strong:H430" morph="HNcmpa">אֱלֹהִ֑ים</w> <w lemma="strong:H853" morph="HTo">אֵ֥ת</w> <w lemma="strong:H8064" morph="HTd/Ncmpa">הַשָּׁמַ֖יִם</w> <w lemma="strong:H853" morph="HC/To">וְאֵ֥ת</w> <w lemma="strong:H776" morph="HTd/Ncfsa">הָאָֽרֶץ</w>׃

We'd need to obtain more detailed information from our friend Michael Johnson to find out how he came by the source text.

DavidHaslam commented 6 years ago

IMHO, it's unsuitable for the lemma for Strongs to include a space because (in effect) the markup is split into several unconnected attributes.

The prefix letters and augment codes are not independent properties, but rather they are extensions to the one property called Strongs.

Practically, the resulting "stretchy spaces" also give rise to serious misalignments when Strongs are displayed in (e.g.) Xiphos.

It might be preferable to use a punctuation mark (such as a period), and to have only one attribute value rather than several.

How this might work in practice while still giving a meaningful display is unclear.

Further discussion with the developers should be sought.

DavidTroidl commented 6 years ago
  1. The morph segmentation in the new version of the module is done by seg elements, which doesn't seem to cause a problem in Xiphos or BPBible. The display is not important at this point.

  2. The "n" attribute need not be used by SWORD. Limiting its use to enumerating word order is outside the OSIS specification.

  3. The morphology markup shows up in both front ends. The question is interpreting the codes into meaningful explanations. If I knew the format of the Robinson lexicon module, that would work.

  4. The augment is designed to be a separate attribute. It can be used to augment the Strong number in applications that can access our lexicon. Otherwise, the Strong number can be used on its own. For the purposes of the module, I have restricted the lemma to use only the Strong number. Until SWORD has a way to use our lexicon, the other parts of the lemma are unnecessary in the module.

  5. The "stretchy spaces" appear to be more due to the morph segmentation producing erratic line breaks.

dowens76 commented 6 years ago

The morphology markup shows up in both front ends. The question is interpreting the codes into meaningful explanations. If I knew the format of the Robinson lexicon module, that would work.

DavidTroidl, I made a module like Robinson for the Westminster text at one point. I can dig out the files. Basically you need a line for each logical possibility in the parsing scheme.

dowens76 commented 6 years ago

Essentially I think you create a dictionary module in the imp format. That is the simplest way. Here is an example entry:

$$$@vQPfsc
Hebrew: Verb, Qal passive, Participle, feminine, singular, construct

The $$$ introduces the dictionary key. The second line is the content of the entry. But most SWORD frontends display the dictionary entry key as well, so there is no need to repeat it in the entry. I hope that makes sense.

dowens76 commented 6 years ago

Actually, let me see what I can come up with for the morph module, since I have done this before.

DavidHaslam commented 6 years ago

While David & Daniel are pondering morphology markup, I'm still concerned about morpheme segmentation.

It seems to me that the use of the seg element overlooks a very important point. An XML element only determines what happens to the text and elements within it. It's therefore impossible by using seg elements alone to achieve a functionality in (e.g.) SWORD whereby a morpheme segment separator is displayed or hidden.

A few years ago, I came across a book with the title, "Space Between Words: The Origins of Silent Reading".

Suppose we wanted SWORD users to experience what it's like to see Scriptio Continua?

How would this be implemented (assuming SWORD was enhanced to support it) ?

My best guess would be by replacing each space in the module text by an OSIS milestone element.

<milestone type="x-space" marker=" " />

Then the SWORD engine could toggle the display of the marker string (here a single space).

Front-end UI would need a module option to select Scriptio Continua.

There are features like this already. Think of the Pilcrow sign in the KJV module when you flip from Verse Per Line to Paragraphs.

In the same manner, morpheme segmentation could be implemented like this

<milestone type="x-mss" marker="/" />

where "mss" denotes Morpheme Segment Separator.

At least, this is how I would do it. Anyone got any better idea?

DavidHaslam commented 6 years ago

Is it the case that STEP Bible is the only front-end that supports morpheme segments?

STEP Bible uses these structures to provide colour coding. It just uses 2 colours to show different parts, alternating between the two.

Here's a question I asked in 2014 to sword-devel:

If the SWORD engine does not actually implement any show/hide filter for OSISMorphSegmentation, might it not be preferred (in the relevant Hebrew conf files) to replace

GlobalOptionFilter=OSISMorphSegmentation

by

Feature=OSISMorphSegmentation

?

DavidHaslam commented 6 years ago

I had also written this to sword-devel in the same thread:

As I would like to document this GlobalOptionFilter property in our wiki page about conf files, a response would be helpful.

The only front-end (among those that I use regularly) that provides a token nod to the property is PocketSword.

This merely displays "This is a morpheme segmented Hebrew text".

Even PocketSword has nothing in module preferences to toggle the display of the segmentation.

Has this filter property "fallen by the wayside"?

DavidHaslam commented 6 years ago

On 01/07/2015, Karl Kleinpaste committed a change in the source code for diatheke to add -o M for morpheme segmentation.

NB. This is now documented in the wiki page for diatheke. But has it ever been tested?

What exactly is the SWORD engine looking for in the OSIS XML whereby the filter will change what SWORD outputs?

DavidTroidl commented 6 years ago

Daniel: Thank you. I appreciate the help.

David: There are five or six different topics running in this thread at the same time. It is almost impossible to communicate coherently about all of them at the same time. Morpheme segmentation is something that needs to be resolved with SWORD, before it can be addressed here. Please, let us stick to the main point: getting a workable module out in the near future. On that score the interpretation of morphology is the primary issue.

DavidHaslam commented 6 years ago

I appreciate that point. I'm trying to get my head round why SWORD has a feature that nobody seems to know exactly how it works. cf. The Morpheme segmentation filter is also specified in the WLC module.

DavidHaslam commented 6 years ago

Are the Hebrew morphology codes found in module hboWLCeb substantially the same as used in morphhb ?

Albeit perhaps a smaller set, due to when and how the source text was made.

Might it even be the case that the source text for the module was simply a earlier snapshot from morphhb and used without proper attribution?

DavidHaslam commented 6 years ago

FYI. The KJV module maintained by CrossWire contains morphology markup for the NT. Here's what it looks like:

<verse sID="Matt.1.1" osisID="Matt.1.1"/><w src="1" lemma="strong:G976 lemma.TR:βιβλος" morph="robinson:N-NSF">The book</w> <w src="2" lemma="strong:G1078 lemma.TR:γενεσεως" morph="robinson:N-GSF">of the generation</w> <w src="3" lemma="strong:G2424 lemma.TR:ιησου" morph="robinson:N-GSM">of Jesus</w> <w src="4" lemma="strong:G5547 lemma.TR:χριστου" morph="robinson:N-GSM">Christ</w>, <w src="5" lemma="strong:G5207 lemma.TR:υιου" morph="robinson:N-GSM">the son</w> <w src="6" lemma="strong:G1138 lemma.TR:δαβιδ" morph="robinson:N-PRI">of David</w>, <w src="7" lemma="strong:G5207 lemma.TR:υιου" morph="robinson:N-GSM">the son</w> <w src="8" lemma="strong:G11 lemma.TR:αβρααμ" morph="robinson:N-PRI">of Abraham</w>.<verse eID="Matt.1.1"/>

The OSIS XML header contains these work elements:

  <work osisWork="KJV">
    <title>King James Version (1769) with Strongs Numbers and Morphology</title>
    <identifier type="OSIS">Bible.KJV</identifier>
    <scope>Gen-Rev</scope>
    <refSystem>Bible.KJV</refSystem>
  </work>
  <work osisWork="defaultReferenceScheme">
    <refSystem>Bible.KJV</refSystem>
  </work>
  <work osisWork="strong">
    <refSystem>Dict.Strongs</refSystem>
  </work>
  <work osisWork="robinson">
    <refSystem>Dict.Robinsons</refSystem>
  </work>
  <work osisWork="strongMorph">
    <refSystem>Dict.strongMorph</refSystem>
  </work>
  <work osisWork="lemma.TR">
    <refSystem>Dict.TR</refSystem>
  </work>
DavidHaslam commented 6 years ago

CrossWire has just released OSHB module version 1.2 I just installed it. It obsoletes module OSMHB

DavidHaslam commented 6 years ago

I have reported to the "modules team" at CrossWire a number of issues with the updated module.

I think these are all in their court. I trust that my actions will result in proper fixes asap.

DavidHaslam commented 6 years ago

The reported issues have not yet been fixed by CrossWire!