Closed destatez closed 7 years ago
Hi David,
This sounds good. I have a pair of scripts ('scripts/find_foreign.py' and
'scripts/fix_foreign.py') that I used to find segments of Greek & Hebrew
text that weren't enclosed in
I think there are a number of ways we could use scripts or XQuery to make the analysis and fixing of the markup faster. My immediate focus is on making the document valid TEI/OSIS again.
All the best, Chuck
On Thu, Nov 24, 2016 at 2:18 PM, David Statezni notifications@github.com wrote:
We should identify below, all of the grammar abbreviations that occur which should have the grammar tagging around them. e.g. adv., for an adverb. A script should be able to be developed which can do a global replace (inclusion of the tagging) for each instance that is not already tagged. The list of these can be extracted from the frontal material.
Most of the current instances of tagging occur after the
tag-pair and the tag-pair and before the first tag-pair, but there are also current instances that a a part of the contents of a tag-pair. A decision will need to made when developing and running this script, whether the "replacements" should only before the tag-pair or whether they should be "replaced" wherever they occur. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/translatable-exegetical-tools/Abbott-Smith/issues/60, or mute the thread https://github.com/notifications/unsubscribe-auth/AAaEFpYzXKuN2WnlTwc30m5BGEpyVZOXks5rBfEpgaJpZM4K78St .
Charles
That sounds like a plan. I have been using perl to do all sorts of global replacements for the ULB, UDB, Notes, tW, etc. Either tool can do the job. My thoughts on this particular topic and Issue 59, were to wait until all manual editing is complete and use the scripts to "catch" any that were missed by the editors.
Dave
On Thu, Nov 24, 2016 at 5:27 PM, Charles Bearden notifications@github.com wrote:
Hi David,
This sounds good. I have a pair of scripts ('scripts/find_foreign.py' and 'scripts/fix_foreign.py') that I used to find segments of Greek & Hebrew text that weren't enclosed in
and to add the tags. Possibly they could be adapted to this purpose as well. I may not get to that immediately, so others may beat me to the punch with a different approach. I think there are a number of ways we could use scripts or XQuery to make the analysis and fixing of the markup faster. My immediate focus is on making the document valid TEI/OSIS again.
All the best, Chuck
On Thu, Nov 24, 2016 at 2:18 PM, David Statezni notifications@github.com wrote:
We should identify below, all of the grammar abbreviations that occur which should have the grammar tagging around them. e.g. adv., for an adverb. A script should be able to be developed which can do a global replace (inclusion of the tagging) for each instance that is not already tagged. The list of these can be extracted from the frontal material.
Most of the current instances of tagging occur after the
tag-pair and the tag-pair and before the first tag-pair, but there are also current instances that a a part of the contents of a tag-pair. A decision will need to made when developing and running this script, whether the "replacements" should only before the tag-pair or whether they should be "replaced" wherever they occur. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <https://github.com/translatable-exegetical-tools/Abbott-Smith/issues/60 , or mute the thread https://github.com/notifications/unsubscribe-auth/ AAaEFpYzXKuN2WnlTwc30m5BGEpyVZOXks5rBfEpgaJpZM4K78St .
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/translatable-exegetical-tools/Abbott-Smith/issues/60#issuecomment-262859288, or mute the thread https://github.com/notifications/unsubscribe-auth/AQAi7-cDL0zaXuULRaQDfVHM-R-z22K-ks5rBitYgaJpZM4K78St .
Hi Dave,
Would it be good to have a channel for general communications about the project, so as not to overload the Github 'issues' feature with more general topics? I don't know any way to contact you other than responding to this issue.
There is a Google Group ("TExT: Abbott-Smith Project"), but the last posts
in it were from me, about my efforts to tag Greek & Hebrew with
I'd like to get the XML file into valid shape, but I don't want to make life harder for those trying to merge my work with the results of their manual review. Also, I think we'll need to discuss some markup choices.
Would it make sense to use the Google Group for general coordination and discussion, or is there another, better channel?
All the best, Chuck
On Thu, Nov 24, 2016 at 7:02 PM, David Statezni notifications@github.com wrote:
Charles
That sounds like a plan. I have been using perl to do all sorts of global replacements for the ULB, UDB, Notes, tW, etc. Either tool can do the job. My thoughts on this particular topic and Issue 59, were to wait until all manual editing is complete and use the scripts to "catch" any that were missed by the editors.
Dave
On Thu, Nov 24, 2016 at 5:27 PM, Charles Bearden <notifications@github.com
wrote:
Hi David,
This sounds good. I have a pair of scripts ('scripts/find_foreign.py' and 'scripts/fix_foreign.py') that I used to find segments of Greek & Hebrew text that weren't enclosed in
and to add the tags. Possibly they could be adapted to this purpose as well. I may not get to that immediately, so others may beat me to the punch with a different approach. I think there are a number of ways we could use scripts or XQuery to make the analysis and fixing of the markup faster. My immediate focus is on making the document valid TEI/OSIS again.
All the best, Chuck
On Thu, Nov 24, 2016 at 2:18 PM, David Statezni < notifications@github.com> wrote:
We should identify below, all of the grammar abbreviations that occur which should have the grammar tagging around them. e.g. adv., for an adverb. A script should be able to be developed which can do a global replace (inclusion of the tagging) for each instance that is not already tagged. The list of these can be extracted from the frontal material.
Most of the current instances of tagging occur after the
tag-pair and the tag-pair and before the first tag-pair, but there are also current instances that a a part of the contents of a tag-pair. A decision will need to made when developing and running this script, whether the "replacements" should only before the tag-pair or whether they should be "replaced" wherever they occur. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <https://github.com/translatable-exegetical-tools/ Abbott-Smith/issues/60 , or mute the thread https://github.com/notifications/unsubscribe-auth/ AAaEFpYzXKuN2WnlTwc30m5BGEpyVZOXks5rBfEpgaJpZM4K78St .
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/translatable-exegetical-tools/ Abbott-Smith/issues/60#issuecomment-262859288, or mute the thread https://github.com/notifications/unsubscribe- auth/AQAi7-cDL0zaXuULRaQDfVHM-R-z22K-ks5rBitYgaJpZM4K78St .
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/translatable-exegetical-tools/Abbott-Smith/issues/60#issuecomment-262861621, or mute the thread https://github.com/notifications/unsubscribe-auth/AAaEFrHkjuVkkc_JnjnERffTnINe-zn7ks5rBjOigaJpZM4K78St .
Charles
I just got connected to that Google Group. That sounds like a good means of communications. We really need to get Chapel and possibly Todd connected to it, since they are the leads. I cc'd then on this reply. I am just an editor and tool-guy.
Dave
On Fri, Nov 25, 2016 at 5:54 PM, Charles Bearden notifications@github.com wrote:
Hi Dave,
Would it be good to have a channel for general communications about the project, so as not to overload the Github 'issues' feature with more general topics? I don't know any way to contact you other than responding to this issue.
There is a Google Group ("TExT: Abbott-Smith Project"), but the last posts in it were from me, about my efforts to tag Greek & Hebrew with
, about a year ago. For instance, I don't know anything about the work of manual review that is evidently going on (which is great news!). I'd like to get the XML file into valid shape, but I don't want to make life harder for those trying to merge my work with the results of their manual review. Also, I think we'll need to discuss some markup choices.
Would it make sense to use the Google Group for general coordination and discussion, or is there another, better channel?
All the best, Chuck
On Thu, Nov 24, 2016 at 7:02 PM, David Statezni notifications@github.com wrote:
Charles
That sounds like a plan. I have been using perl to do all sorts of global replacements for the ULB, UDB, Notes, tW, etc. Either tool can do the job. My thoughts on this particular topic and Issue 59, were to wait until all manual editing is complete and use the scripts to "catch" any that were missed by the editors.
Dave
On Thu, Nov 24, 2016 at 5:27 PM, Charles Bearden < notifications@github.com
wrote:
Hi David,
This sounds good. I have a pair of scripts ('scripts/find_foreign.py' and 'scripts/fix_foreign.py') that I used to find segments of Greek & Hebrew text that weren't enclosed in
and to add the tags. Possibly they could be adapted to this purpose as well. I may not get to that immediately, so others may beat me to the punch with a different approach. I think there are a number of ways we could use scripts or XQuery to make the analysis and fixing of the markup faster. My immediate focus is on making the document valid TEI/OSIS again.
All the best, Chuck
On Thu, Nov 24, 2016 at 2:18 PM, David Statezni < notifications@github.com> wrote:
We should identify below, all of the grammar abbreviations that occur which should have the grammar tagging around them. e.g. adv., for an adverb. A script should be able to be developed which can do a global replace (inclusion of the tagging) for each instance that is not already tagged. The list of these can be extracted from the frontal material.
Most of the current instances of tagging occur after the
tag-pair and the tag-pair and before the first tag-pair, but there are also current instances that a a part of the contents of a tag-pair. A decision will need to made when developing and running this script, whether the "replacements" should only before the tag-pair or whether they should be "replaced" wherever they occur. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <https://github.com/translatable-exegetical-tools/ Abbott-Smith/issues/60 , or mute the thread https://github.com/notifications/unsubscribe-auth/ AAaEFpYzXKuN2WnlTwc30m5BGEpyVZOXks5rBfEpgaJpZM4K78St .
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/translatable-exegetical-tools/ Abbott-Smith/issues/60#issuecomment-262859288, or mute the thread https://github.com/notifications/unsubscribe- auth/AQAi7-cDL0zaXuULRaQDfVHM-R-z22K-ks5rBitYgaJpZM4K78St .
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/translatable-exegetical-tools/ Abbott-Smith/issues/60#issuecomment-262861621, or mute the thread https://github.com/notifications/unsubscribe-auth/AAaEFrHkjuVkkc_ JnjnERffTnINe-zn7ks5rBjOigaJpZM4K78St .
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/translatable-exegetical-tools/Abbott-Smith/issues/60#issuecomment-263036273, or mute the thread https://github.com/notifications/unsubscribe-auth/AQAi7wqOE0uxH7DT2JjL3ANVH0PASNStks5rB4M4gaJpZM4K78St .
Charles
It's taking some time to get approved for that Google Group, though I thought that I had received a message that I was. So, I can't answer you via a post against your latest topic. You can either wait until I get approved, or you can pass me your email address and I can send to a message about what the editors are doing. Your pick
Dave
On Thu, Nov 24, 2016 at 6:02 PM, David Statezni dave@statezni.com wrote:
Charles
That sounds like a plan. I have been using perl to do all sorts of global replacements for the ULB, UDB, Notes, tW, etc. Either tool can do the job. My thoughts on this particular topic and Issue 59, were to wait until all manual editing is complete and use the scripts to "catch" any that were missed by the editors.
Dave
On Thu, Nov 24, 2016 at 5:27 PM, Charles Bearden <notifications@github.com
wrote:
Hi David,
This sounds good. I have a pair of scripts ('scripts/find_foreign.py' and 'scripts/fix_foreign.py') that I used to find segments of Greek & Hebrew text that weren't enclosed in
and to add the tags. Possibly they could be adapted to this purpose as well. I may not get to that immediately, so others may beat me to the punch with a different approach. I think there are a number of ways we could use scripts or XQuery to make the analysis and fixing of the markup faster. My immediate focus is on making the document valid TEI/OSIS again.
All the best, Chuck
On Thu, Nov 24, 2016 at 2:18 PM, David Statezni <notifications@github.com
wrote:
We should identify below, all of the grammar abbreviations that occur which should have the grammar tagging around them. e.g. adv., for an adverb. A script should be able to be developed which can do a global replace (inclusion of the tagging) for each instance that is not already tagged. The list of these can be extracted from the frontal material.
Most of the current instances of tagging occur after the
tag-pair and the tag-pair and before the first tag-pair, but there are also current instances that a a part of the contents of a tag-pair. A decision will need to made when developing and running this script, whether the "replacements" should only before the tag-pair or whether they should be "replaced" wherever they occur. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/translatable-exegetical-tools/Abbott- Smith/issues/60, or mute the thread https://github.com/notifications/unsubscribe-auth/AAaEFpYzX KuN2WnlTwc30m5BGEpyVZOXks5rBfEpgaJpZM4K78St .
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/translatable-exegetical-tools/Abbott-Smith/issues/60#issuecomment-262859288, or mute the thread https://github.com/notifications/unsubscribe-auth/AQAi7-cDL0zaXuULRaQDfVHM-R-z22K-ks5rBitYgaJpZM4K78St .
Dave, you've already been approved for the group using your Gmail address. I approved you almost immediately. Try sending an email to text-abbott-smith-project@googlegroups.com.
Hi Dave,
I was able to see your post to the group with the subject "Group Acceptance". Looks like you are able to post now. If you didn't get a copy of the reply in your email inbox, perhaps you just need to edit your email preference settings for the group.
I'm looking forward to hearing about what's going on with the dictionary. I see you're with Wycliffe, which is very cool.
All the best, Chuck
On Sat, Nov 26, 2016 at 6:46 PM, David Statezni notifications@github.com wrote:
Charles
It's taking some time to get approved for that Google Group, though I thought that I had received a message that I was. So, I can't answer you via a post against your latest topic. You can either wait until I get approved, or you can pass me your email address and I can send to a message about what the editors are doing. Your pick
Dave
On Thu, Nov 24, 2016 at 6:02 PM, David Statezni dave@statezni.com wrote:
Charles
That sounds like a plan. I have been using perl to do all sorts of global replacements for the ULB, UDB, Notes, tW, etc. Either tool can do the job. My thoughts on this particular topic and Issue 59, were to wait until all manual editing is complete and use the scripts to "catch" any that were missed by the editors.
Dave
On Thu, Nov 24, 2016 at 5:27 PM, Charles Bearden < notifications@github.com
wrote:
Hi David,
This sounds good. I have a pair of scripts ('scripts/find_foreign.py' and 'scripts/fix_foreign.py') that I used to find segments of Greek & Hebrew text that weren't enclosed in
and to add the tags. Possibly they could be adapted to this purpose as well. I may not get to that immediately, so others may beat me to the punch with a different approach. I think there are a number of ways we could use scripts or XQuery to make the analysis and fixing of the markup faster. My immediate focus is on making the document valid TEI/OSIS again.
All the best, Chuck
On Thu, Nov 24, 2016 at 2:18 PM, David Statezni < notifications@github.com
wrote:
We should identify below, all of the grammar abbreviations that occur which should have the grammar tagging around them. e.g. adv., for an adverb. A script should be able to be developed which can do a global replace (inclusion of the tagging) for each instance that is not already tagged. The list of these can be extracted from the frontal material.
Most of the current instances of tagging occur after the
tag-pair and the tag-pair and before the first tag-pair, but there are also current instances that a a part of the contents of a tag-pair. A decision will need to made when developing and running this script, whether the "replacements" should only before the tag-pair or whether they should be "replaced" wherever they occur. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/translatable-exegetical-tools/Abbott- Smith/issues/60, or mute the thread https://github.com/notifications/unsubscribe-auth/AAaEFpYzX KuN2WnlTwc30m5BGEpyVZOXks5rBfEpgaJpZM4K78St .
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/translatable-exegetical-tools/ Abbott-Smith/issues/60#issuecomment-262859288, or mute the thread https://github.com/notifications/unsubscribe- auth/AQAi7-cDL0zaXuULRaQDfVHM-R-z22K-ks5rBitYgaJpZM4K78St .
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/translatable-exegetical-tools/Abbott-Smith/issues/60#issuecomment-263094987, or mute the thread https://github.com/notifications/unsubscribe-auth/AAaEFhnJTNgXWJ0cb_lVkZYUeFMg4S0Aks5rCNLQgaJpZM4K78St .
Re: par. 2 of the 1st post: Yes, I do think that the grammar abbreviations even in the Sense sections should be tagged. This might be a bit beyond the original scope of making a digital representation of A-S, so perhaps this should wait until Stage 2 and be considered part of the UGL. What I mean is that I see use for it where the grammar tags in UGL can be linked to UGG so that these grammatical concepts are explained in our Grammar. That is beyond the Stage 1 goal.
Just to clarify, as part of digitizing A-S, we do want the grammar abbreviations to have tagging around them. This is valid and needed for stage 1. But linking those tags to UGG needs to wait until stage 2.
I have run across an issue on this topic. I have done searches of the XML looking for the POS "keywords" and have found instances of these that are a part of a description, as well as what I would call viable instances. I have attached some examples of the search output and need a little clarification on what should be and what shouldn't be tagged. The keywords that I used were as follows. The search would find any word that started with the keyword. That was why I had to qualify some to preclude others from appearing in the search. adj, adv, article, conj, interj, num, part, prep, pron, subst, art. (and NOT article), super (and NOT superscript), noun (and NOT pron), verb (and NOT adv)
I think the examples in your txt file (verb, part and art) should not be tagged. It looks like ptcp. should be tagged since it is used in lexical entries rather than in 'running text'.
I am concerned about the current state of the pos tags in A-S. There are currently 53 different ”values” that are tagged in the XML (see A_S_XML_pos_instance_text.txt). {I combined instances that were abbreviations or variations of abbreviations for those listed} There are total of 357 instances where these are tagged, with 29 of these being within the sense data (see A_S_pos_sense_Instances.txt). The remainder are within the orth data or etym data, which is where I would have expected them. My questions, as relates to automating the tagging of the XML file are:
Only tag what is in orth and etmy data.
Updated XML with only13 changes needed, when scope was reduced to orth & etym
We should identify below, all of the grammar abbreviations that occur which should have the grammar tagging around them. e.g.adv. , for an adverb. A script should be able to be developed which can do a global replace (inclusion of the tagging) for each instance that is not already tagged. The list of these can be extracted from section "I. GENERAL." at the beginning of the XML file.
Most of the current instances of tagging occur after the