translatable-exegetical-tools / Abbott-Smith

Abbott-Smith's Manual Greek Lexicon
31 stars 19 forks source link

Fixing book abbreviations in <ref> content for entries beginning with… #109

Closed cbearden closed 2 years ago

cbearden commented 2 years ago

… α; I identified them by comparing them with a script that compares book abbreviations with the list on xii of the lexicon; in each case I verify the spelling of the abbrev in the text. Many more fixes to come. The file validates. See issue #107 .

destatez commented 2 years ago

Charles

I took a look at the difs on this. Everything looks good except for line 4059, which is a ‘lengthy’ one. I have both old and new versions shown below. The problem is as you expanded the ‘range’ in the old version to the consecutive verse references you left out 1John 3:7....

Sorry about the format. My tablet and Chrome have a few issues when pasting in from spreadsheets.

Dave

Old: Jn 3:6-9

New:

3:6,89

On Wed, Sep 8, 2021 at 9:11 PM Charles Bearden @.***> wrote:

… α; I identified them by comparing them with a script that compares book abbreviations with the list on xii of the lexicon; in each case I verify the spelling of the abbrev in the text. Many more fixes to come. The file validates. See issue #107 https://github.com/translatable-exegetical-tools/Abbott-Smith/issues/107 .

You can view, comment on, or merge this pull request online at:

https://github.com/translatable-exegetical-tools/Abbott-Smith/pull/109 Commit Summary

  • Fixing book abbreviations in content for entries beginning with α; I identified them by comparing them with a script that compares book abbreviations with the list on xii of the lexicon; in each case I verify the spelling of the abbrev in the text. Many more fixes to come. The file validates.

File Changes

Patch Links:

- https://github.com/translatable-exegetical-tools/Abbott-Smith/pull/109.patch

https://github.com/translatable-exegetical-tools/Abbott-Smith/pull/109.diff

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/translatable-exegetical-tools/Abbott-Smith/pull/109, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEACF3YWBZ5EU2TE7C2PV6TUBAJUPANCNFSM5DWCKJAA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

--

cbearden commented 2 years ago

Hi Dave,

I made that change because the original text doesn't specify the range, but rather the particular verses 1 John 3:6,8,9 (see attached screenshot). In the Greek texts I have, verse 7 doesn't include a form of ἁμαρτάνω, so giving the range instead of the individual verses reflected both the dictionary and the Greek text. I suppose we could give

<ref osisRef="1John.3.6">3:6</ref>,<ref osisRef="1John.3.8-9">8,9</ref>

if we thought it was important to convert comma-separated sequences of verses into ranges. That hasn't been consistently done so far, but I do see it sometimes. My preference though is to stick closely to the form of the original text. I think the markup under ἁμαρτωλός on line 4109 is wrong, since the text has "Mt 9^10,11,12". Even if you want a range in the @osisRef, the content of the ref element should be what the original has. I would go so far as to say the verse numbers should all be marked up as superscript, with no colon between chapter and verse, but it's a bit late for that.

hamartano

destatez commented 2 years ago

Charles

Glad you went back to the source, and using its syntax with comma separation is the best solution. Glad you found this. In our Phase I work we missed the fact that the original did not include verse 7 and we had it as a range. Our lexicon file was created with that range and I will have to get an Issue opened to make sure it’s fixed in our repo.

Dave

On Fri, Sep 10, 2021 at 5:14 PM Charles Bearden @.***> wrote:

Hi Dave,

I made that change because the original text doesn't specify the range, but rather the particular verses 1 John 3:6,8,9 (see attached screenshot). In the Greek texts I have, verse 7 doesn't include a form of ἁμαρτάνω, so giving the range instead of the individual verses reflected both the dictionary and the Greek text. I suppose we could give

3:6,8,9

if we thought it was important to convert comma-separated sequences of verses into ranges. That hasn't been consistently done so far, but I do see it sometimes. My preference though is to stick closely to the form of the original text. I think the markup under ἁμαρτωλός on line 4109 is wrong, since the text has "Mt 9^10,11,12". Even if you want a range in the @osisRef, the content of the ref element should be what the original has.

[image: hamartano] https://user-images.githubusercontent.com/427030/132922318-c65a7337-f412-463b-84a8-b68ead4650e5.jpeg

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/translatable-exegetical-tools/Abbott-Smith/pull/109#issuecomment-917244801, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEACF332X6DLY7EWISDAJ7TUBJ7K7ANCNFSM5DWCKJAA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

--

cbearden commented 2 years ago

Dave,

I appreciate your understanding. It’s a large project, and a number different people and groups have worked on it at various times. They have all had different priorities for the markup and different use cases. I imagine a case can be made for normalizing the text in some way to make it more useful in a particular context. I have my own preferences, but we’ve needed everybody’s contribution. I can remember when large swathes of the dictionary were just OCR full of errors and no markup. I’m grateful that we’ve gotten as far as we have!

Chuck

destatez commented 2 years ago

Charles

I sure am glad for your guy’s work on compliance with the A-S PDF, but also with your compliance checks with all of the standards used in the XML. Right now UnfoldingWord has the editors and the checker for en_UGL ignoring the references. I want to make sure I stay up on the mods you are making to make sure that those mods get worked into the en_UGL. I may have to go back and review all of the TeT Issues to see if there needs to be equivalent changes to our files. Are there any that stand out in your memory that resulted in change(s) to references, content and/or format? Todd Price had it right when he was emphasizing our Phase I activity to make sure that we would be working from a stable base for Phase II. You guys need that as well. I’m glad that in this sense we are all working towards the same goal.

Dave

On Sat, Sep 11, 2021 at 9:49 AM Charles Bearden @.***> wrote:

Dave,

I appreciate your understanding. It’s a large project, and a number different people and groups have worked on it at various times. They have all had different priorities for the markup and different use cases. I imagine a case can be made for normalizing the text in some way to make it more useful in a particular context. I have my own preferences, but we’ve needed everybody’s contribution. I can remember when large swathes of the dictionary were just OCR full of errors and no markup. I’m grateful that we’ve gotten as far as we have!

Chuck

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/translatable-exegetical-tools/Abbott-Smith/pull/109#issuecomment-917419679, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEACF366DTR5EMMFQ4XRDD3UBNT7PANCNFSM5DWCKJAA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

--

jonathanrobie commented 2 years ago

Sounds like we are agreeing that we should not change the surface text for references.

Can we agree that any changes to the surface text should be restricted to notes, contained in markup? Of course, we have free reign with attributes.

On Sat, Sep 11, 2021 at 11:10 AM David Statezni @.***> wrote:

Charles

I sure am glad for your guy’s work on compliance with the A-S PDF, but also with your compliance checks with all of the standards used in the XML. Right now UnfoldingWord has the editors and the checker for en_UGL ignoring the references. I want to make sure I stay up on the mods you are making to make sure that those mods get worked into the en_UGL. I may have to go back and review all of the TeT Issues to see if there needs to be equivalent changes to our files. Are there any that stand out in your memory that resulted in change(s) to references, content and/or format? Todd Price had it right when he was emphasizing our Phase I activity to make sure that we would be working from a stable base for Phase II. You guys need that as well. I’m glad that in this sense we are all working towards the same goal.

Dave

On Sat, Sep 11, 2021 at 9:49 AM Charles Bearden @.***> wrote:

Dave,

I appreciate your understanding. It’s a large project, and a number different people and groups have worked on it at various times. They have all had different priorities for the markup and different use cases. I imagine a case can be made for normalizing the text in some way to make it more useful in a particular context. I have my own preferences, but we’ve needed everybody’s contribution. I can remember when large swathes of the dictionary were just OCR full of errors and no markup. I’m grateful that we’ve gotten as far as we have!

Chuck

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/translatable-exegetical-tools/Abbott-Smith/pull/109#issuecomment-917419679 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AEACF366DTR5EMMFQ4XRDD3UBNT7PANCNFSM5DWCKJAA

. Triage notifications on the go with GitHub Mobile for iOS < https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675

or Android < https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub .

--

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/translatable-exegetical-tools/Abbott-Smith/pull/109#issuecomment-917422893, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANPTPOJDCNMYDVAKV4SWBLUBNWPHANCNFSM5DWCKJAA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

destatez commented 2 years ago

Jonathan

I agree with your plan of attack. You guys are much more knowledgeable about all the underlying standards on this, so its call on on the best way to update the XML. We are no longer using the XML. All the en_UGL files were created from it quite a good while back. That’s why I want to do a little review of your Issues and resolutions to determine if/which ones need to manually get worked into our files. We have asked our editors to refer back to the PDF if they have questions on A-S content.

Dave

On Sat, Sep 11, 2021 at 10:25 AM Jonathan Robie @.***> wrote:

Sounds like we are agreeing that we should not change the surface text for references.

Can we agree that any changes to the surface text should be restricted to notes, contained in markup? Of course, we have free reign with attributes.

On Sat, Sep 11, 2021 at 11:10 AM David Statezni @.***> wrote:

Charles

I sure am glad for your guy’s work on compliance with the A-S PDF, but also with your compliance checks with all of the standards used in the XML. Right now UnfoldingWord has the editors and the checker for en_UGL ignoring the references. I want to make sure I stay up on the mods you are making to make sure that those mods get worked into the en_UGL. I may have to go back and review all of the TeT Issues to see if there needs to be equivalent changes to our files. Are there any that stand out in your memory that resulted in change(s) to references, content and/or format? Todd Price had it right when he was emphasizing our Phase I activity to make sure that we would be working from a stable base for Phase II. You guys need that as well. I’m glad that in this sense we are all working towards the same goal.

Dave

On Sat, Sep 11, 2021 at 9:49 AM Charles Bearden @.***> wrote:

Dave,

I appreciate your understanding. It’s a large project, and a number different people and groups have worked on it at various times. They have all had different priorities for the markup and different use cases. I imagine a case can be made for normalizing the text in some way to make it more useful in a particular context. I have my own preferences, but we’ve needed everybody’s contribution. I can remember when large swathes of the dictionary were just OCR full of errors and no markup. I’m grateful that we’ve gotten as far as we have!

Chuck

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <

https://github.com/translatable-exegetical-tools/Abbott-Smith/pull/109#issuecomment-917419679

, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AEACF366DTR5EMMFQ4XRDD3UBNT7PANCNFSM5DWCKJAA

. Triage notifications on the go with GitHub Mobile for iOS <

https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675

or Android <

https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub

.

--

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub < https://github.com/translatable-exegetical-tools/Abbott-Smith/pull/109#issuecomment-917422893 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AANPTPOJDCNMYDVAKV4SWBLUBNWPHANCNFSM5DWCKJAA

. Triage notifications on the go with GitHub Mobile for iOS < https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675

or Android < https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/translatable-exegetical-tools/Abbott-Smith/pull/109#issuecomment-917424851, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEACF32ILPJD5FSIVSDBDETUBNYE5ANCNFSM5DWCKJAA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

--

cbearden commented 2 years ago

Sounds like we are agreeing that we should not change the surface text for references.

Let me restate my understanding of what you are saying here: We should not depart from the content of the original document for references. "surface text" here refers to the Abbott-Smith original?

Can we agree that any changes to the surface text should be restricted to notes, contained in markup? Of course, we have free reign with attributes.

If I got "surface text" right, then I'm on board.

jonathanrobie commented 2 years ago

Yes, "surface text" means the Abbott-Smith original.

Jonathan

On Sat, Sep 11, 2021 at 12:40 PM Charles Bearden @.***> wrote:

Sounds like we are agreeing that we should not change the surface text for references.

Let me restate my understanding of what you are saying here: We should not depart from the content of the original document for references. "surface text" here refers to the Abbott-Smith original?

Can we agree that any changes to the surface text should be restricted to notes, contained in markup? Of course, we have free reign with attributes.

If I got "surface text" right, then I'm on board.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/translatable-exegetical-tools/Abbott-Smith/pull/109#issuecomment-917435966, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANPTPN2KAM5Z5VDIVYM3R3UBOA6HANCNFSM5DWCKJAA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

destatez commented 2 years ago

Guys

I got thinking about this after my last response. If surface text refers to the XML that we worked on, the answer would be NO!. Since I've seen your definition as the PDF, the answer is YES!

Dave

On Sat, Sep 11, 2021 at 11:40 AM Charles Bearden @.***> wrote:

Sounds like we are agreeing that we should not change the surface text for references.

Let me restate my understanding of what you are saying here: We should not depart from the content of the original document for references. "surface text" here refers to the Abbott-Smith original?

Can we agree that any changes to the surface text should be restricted to notes, contained in markup? Of course, we have free reign with attributes.

If I got "surface text" right, then I'm on board.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/translatable-exegetical-tools/Abbott-Smith/pull/109#issuecomment-917435966, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEACF37I6X3VS65EMBDGU73UBOA6HANCNFSM5DWCKJAA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

cbearden commented 2 years ago

Excellent, Dave & Jonathan!

cbearden commented 2 years ago

Jonathan, there are a good many changes in the ref element content to come (then I'll turn to the @osisRef attribute values and to discrepancies between element content & attribute values). Do you want me to push them to this branch/pull request in batches, so that you aren't reviewing so many at once, or do you prefer to wait until I've made all the needed changes I've identified? I can go either way.

jonathanrobie commented 2 years ago

I would rather look at the whole shebang, unless you want me to look at a smaller batch that you have questions about.

On Sat, Sep 11, 2021 at 2:56 PM Charles Bearden @.***> wrote:

Jonathan, there are a good many changes in the ref element content to come (then I'll turn to the @osisRef attribute values and to discrepancies between element content & attribute values). Do you want me to push them to this branch/pull request in batches, so that you aren't reviewing so many at once, or do you prefer to wait until I've made all the needed changes I've identified? I can go either way.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/translatable-exegetical-tools/Abbott-Smith/pull/109#issuecomment-917456729, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANPTPKOCR7PGBWUOVXYMA3UBOQ4LANCNFSM5DWCKJAA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

cbearden commented 2 years ago

Okay, in that case hold off on this pull request. There are probably about 135 more fixes to book abbreviations in ref element content, that is cases where the book abbreviation in the element doesn't match one of the abbreviations on xii of the dictionary. What I do is check the reference in the XML against the PDF and correct it if the XML doesn't match the PDF. There have been two cases where A-S misspells the abbreviation:

and I left those refs unchanged and added a comment <!-- sic abbrev --> for any people working on the XML in the future. The attached file has the remaining cases where my script found an abbreviation that doesn't match the abbrevs list, and I'll check them against the PDF.

Fixing these abbreviations, and ensuring that all the @osisRef abbreviations match the OSIS ID standard (the next stage), will enable me to compare the two to look for discrepancies (the third stage). Does this make sense?

mismatches.txt

jonathanrobie commented 2 years ago

Yes, it does make sense.

On Sat, Sep 11, 2021 at 5:54 PM Charles Bearden @.***> wrote:

Okay, in that case hold off on this pull request. There are probably about 135 more fixes to book abbreviations in ref element content, that is cases where the book abbreviation in the element doesn't match one of the abbreviations on xii of the dictionary. What I do is check the reference in the XML against the PDF and correct it if the XML doesn't match the PDF. There have been two cases where A-S misspells the abbreviation:

  • "II Tim" in ἀγαπάω|G25
  • "Lu" in ἄρτος|G740

and I left those refs unchanged and added a comment for any people working on the XML in the future. The attached file has the remaining cases where my script found an abbreviation that doesn't match the abbrevs list, and I'll check them against the PDF.

Fixing these abbreviations, and ensuring that all the @osisRef abbreviations match the OSIS ID standard (the next stage), will enable me to compare the two to look for discrepancies (the third stage). Does this make sense?

mismatches.txt https://github.com/translatable-exegetical-tools/Abbott-Smith/files/7148571/mismatches.txt

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/translatable-exegetical-tools/Abbott-Smith/pull/109#issuecomment-917486076, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANPTPNHUX5NIII52M3LNZTUBPFYFANCNFSM5DWCKJAA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.