openscriptures / morphhb

Open Scriptures Hebrew Bible
https://hb.openscriptures.org
Other
182 stars 63 forks source link

Change the lemma 3212 to 1980 #34

Open AndyHubert opened 6 years ago

AndyHubert commented 6 years ago

I recently corresponded with Jason DeRouchie of Bethlehem College and Seminary on this question. He wrote as follows:

Yes, scholars do not affirm a ילךְ verbal root any more. Instead, they recognize that הלךְ acts like an original I-ו verb in certain conjugations of the Qal. I write in my Hebrew grammar on p. 198:

The very common common root הלךְ, “walk,” acts like an original I-ו verb in the Qal yiqtol, wayyiqtol, and imperative. For example, the Qal yiqtol 3ms and 3fs forms are יֵלֵךְ and תֵּלֵךְ, and the Qal wayyiqtol 3mp and 3fp are וַיֵּלְכוּ and וַתֵּלַכְנָה. (A similar situation occurs with הלךְ in all conjugations of the Hiphil––recall the form הוֹלִיךְ.)

Neither of the scholarly standard lexicons (HALOT and DCH) treat ילךְ as a root, so I would discourage its use in BibleArc. Feel free to follow-up with questions. I can develop any of my comments further if it will serve you.

I can make this change in the parsing project so that the lemmas will be updated in the xml that is produced from that. However, I first thought it would be good to bring it up for discussion.

DavidTroidl commented 6 years ago

I understand that 3212 is equivalent to 1980. If you follow it through to the lexicon, you will find it references the same entry in Brown, Driver, Briggs. However, one of the goals of the project is to maintain compatibility with Strong's dictionary, for the benefit of existing applications based on it. There are so many small websites that use it, and so many people unfamiliar with anything beyond Strong's, that maintaining this support was deemed desirable.

AndyHubert commented 6 years ago

Help me understand how changing the lemma to strongs #1980 where it is currently 3212 would make it incompatible with existing applications? Have there not been dozen of other lemma corrections made?

DavidTroidl commented 6 years ago

I certainly hope there haven't been dozens of changes in lemmas. The goal is to maintain fidelity to the original Strong numbers, as much as possible, to retain the advantages of a dictionary. The letters after the Strong numbers are meant to augment the numbers, to reference Brown, Driver, Briggs entries, and gain the advantages of a lexicon.

On 10/18/2017 3:40 AM, Andy Hubert wrote:

Help me understand how changing the lemma to strongs #1980 where it is currently 3212 would make it incompatible with existing applications? Have there not been dozen of other lemma corrections made?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/openscriptures/morphhb/issues/34#issuecomment-337487292, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKwBfWAIf7Jn9m9d3iygrlj4WFS8Ckmks5statjgaJpZM4P5ugI.

AndyHubert commented 6 years ago

There have been dozens of changes in lemmas. Here is the output the the script I recently ran to update the parsing database using the latest morphhb xml files. I didn't count them, but you can see that there are certainly dozens of lemma changes.

Back to the main discussion, however, I still am not understanding why changing the strongs number associated with a lemma to another valid strongs number would cause problems to the fidelity of the original strongs numbers or nullify the advantages of this dictionary. I am the developer of Biblearc.com and we certainly use strongs. Once I import the morphhb with parsings, a change like the one I propose here would simply cause the system to bring up the strongs info for הלך instead of ילך on the corrected words. Thus, I don't see how this would cause problems in the normal use cases. What am I not understanding?

DavidTroidl commented 6 years ago

I have consistently made corrections to the OSHB when I found the need in my own work with it, or when others have pointed them out to me.  I had no idea there were so many.  Corrections are not the same as changes, though.

The point of the dictionary is that it is not limited to the lexical form of a word.  I can look up the word "go" in my dictionary, but I can also look up "went".  The case in point, "yalak", is a major form of "halak", representing over 500 instances in the Hebrew bible.  To represent it as a dictionary entry, it requires an ID, 3212 fulfills that function.  This should pose no problem.  I can open up the OSHB and search for (3212|1980) and find 1555 hits, but I can still search for each one individually if I want.  Why should we destroy that information?

On 10/26/2017 2:34 AM, Andy Hubert wrote:

There have been dozens of changes in lemmas. Here https://github.com/openscriptures/morphhb-parsing/blob/master/morphhb-scripts/correctMainTablesOutput.txt is the output the the script I recently ran to update the parsing database using the latest morphhb xml files. I didn't count them, but you can see that there are certainly dozens of lemma changes.

Back to the main discussion, however, I still am not understanding why changing the strongs number associated with a lemma /to another valid strongs number/ would cause problems to the fidelity of the original strongs numbers or nullify the advantages of this dictionary. I am the developer of Biblearc.com and we certainly use strongs. Once I import the morphhb with parsings, a change like the one I propose here would simply cause the system to bring up the strongs info for הלך instead of ילך on the corrected words. Thus, I don't see how this would cause problems in the normal use cases. What am I not understanding?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/openscriptures/morphhb/issues/34#issuecomment-339565789, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKwBS4kLBu5wiCZVaPIDNpwiwNBNi-_ks5swCf1gaJpZM4P5ugI.


This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus

AndyHubert commented 6 years ago

If there is a legitimate argument for two distinct words here, then we certainly should not destroy the information of the distinction. However, I am not understanding that this is the case. As you said above, "3212 is equivalent to 1980."

It is true that a person might search for (3212|1980) together to get all the results, but that presupposes that they understand that this single word is divided into two different lemmas (in contrast to how things work with every other word). To bring the analogy to English, it is as if searching "come" includes results with "came," searching "bring" includes results with "brought," but searching "go" does not include results with "went." This inconsistency seems to me to be much more of a "bug" than a "feature." When someone clicks on the word הַהֹלֵךְ in Gen 2:14, and chooses to search this lemma, they expect all the results.

One sentence in your last comments which I do not understand is, "The point of the dictionary is that it is not limited to the lexical form of a word." But it is. There are not strongs entries for each conjugated form of each word, but only for each lemma.

DavidTroidl commented 6 years ago

We could write reams of argumentation over this one issue, and then have to repeat the process for a hundred others.  Your point about searching for "bring" and also finding "brought" brings out a point that is relevant here.  This only happens because of an intelligent search functionality that realizes the necessity of making the software conform to the data, rather than trying to force the data into a form to serve the software.  An intelligent Bible app could use the data in our Hebrew lexicon to offer BDB search or Strong search, or any other variation the designer deems appropriate, without compromising the existing data.

The point about the dictionary is that "go" may appear in a lexicon, but "went" would not.  The same applies to looking up "I", "me", "my" and "mine" in a dictionary, where a lexicon would only allow "I" as the first person singular pronoun.  Both could have value to a potential user of the app, and both can be represented, if the data is wide enough to accommodate them, and the app is smart enough to offer to options.

On 10/30/2017 6:29 AM, Andy Hubert wrote:

If there is a legitimate argument for two distinct words here, then we certainly should not destroy the information of the distinction. However, I am not understanding that this is the case. As you said above, "3212 is equivalent to 1980."

It is true that a person might search for (3212|1980) together to get all the results, but that presupposes that they understand that this single word is divided into two different lemmas (in contrast to how things work with every other word). To bring the analogy to English, it is as if searching "come" includes results with "came," searching "bring" includes results with "brought," but searching "go" does not include results with "went." This inconsistency seems to me to be much more of a "bug" than a "feature." When someone clicks on the word הַהֹלֵךְ in Gen 2:14, and chooses to search this lemma, they expect /all/ the results.

One sentence in your last comments which I do not understand is, "The point of the dictionary is that it is not limited to the lexical form of a word." But it is. There are not strongs entries for each conjugated form of each word, but only for each lemma.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/openscriptures/morphhb/issues/34#issuecomment-340404137, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKwBUZ3DQbCy11yfsDTk7YQljL10uy1ks5sxaUKgaJpZM4P5ugI.


This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus