translatable-exegetical-tools / Abbott-Smith

Abbott-Smith's Manual Greek Lexicon
31 stars 19 forks source link

Need an attribute with just the lemma #91

Open jonathanrobie opened 6 years ago

jonathanrobie commented 6 years ago

In the current format, there is no element or attribute that contains the lemma directly without additional text. That makes it difficult to match in queries without preprocessing.

Instead of:

<entry n="διαγινώσκω|G1231">

I would prefer something like:

<entry lemma="διαγινώσκω" strong="G1231">
dowens76 commented 6 years ago

That makes sense to me.

You know more than I about XML schemata. Would we need to modify the schema to make that validate?

destatez commented 6 years ago

@jonathanrobie I reformatted the file with a new name, https://github.com/translatable-exegetical-tools/Abbott-Smith/blob/master/abbott-smith.tei_lemma.xml, and have it merged into the repository. THere were cases where there was no vertical bar in the entry and where there were undefined Strongs, G????. I have attached 2 files with those instances. I am open to changing things it that will work better for you. Let me know. Missing_bar.txt Undefined_Strongs.txt

jonathanrobie commented 6 years ago

Thanks - that's helpful, I hadn't gotten around to this.

Is there a need to have two different files in different formats? Is there need for the format that uses the <entry n="Ἀαρών|G2"> format?

jonathanrobie commented 6 years ago

@dowens76

Would we need to modify the schema to make that validate?

Is there a schema? I can't find a file with the extension .dtd, .xsd, .rnc, or .rng.

destatez commented 6 years ago

I do not believe that there is a schema file. I never found one and built by script based upon the xml file contents. I used a different file since I wasn't sure whether we wanted to keep both formats. We will have to get Todd in the loop to move to 1 file for the new format since he was over the effort to do the manual updates to reflect reality.

dowens76 commented 6 years ago

I have always validated against http://www.crosswire.org/OSIS/teiP5osis.2.5.0.xsd (see TEI@xsi:schemaLocation). I have it in my local files but Git is set to ignore it.

dowens76 commented 6 years ago

I cannot see any reason not to maintain only one file. It would make things much easier.

@destatez Thank you for working on this file. I think we probably should just make the changes directly in abbott-smith.tei.xml. But if you would feel more comfortable looping Todd in, that's okay with me. We started the project together, and it's a good idea to keep key parties updated as changes are made.

dowens76 commented 6 years ago

I went through the entries that did not have a bar (most did not have a Strong's number) and fixed those Strong's numbers that were G????.

destatez commented 6 years ago

I wished we would have talked before you appended the letter "a" to these undefined Strongs. We, the Unlocked Greek Lexicon and Unlocked Greek New Testament teams, have taken an approach by Alan Bunning called Strongs Plus (from ugl docset: "The Strong’s Plus ID referenced above was initially developed by Alan Bunning, where he took the 4-digit Strong’s ID and appended a zero to create a 5-digit ID. This gave him extra IDs to be able to qualify different word forms than the standard Strong’s. We will be using this Strong’s Plus identification for this project.") When we created the ugl files, we made all the Strongs IDs this Strongs Plus, The cases where a particular Greek word from A-S was not a part of Alan's deliverable spreadsheet (which is the root for ugnt), I put the G???/ in the xml and for ugl assigned them a unique ID and lemma file above G99000, and put them in a class of IDs that the ugl team would review and determine the actual Strongs Plus ID and update the ugl lemma file accordingly. The work that you did for these can and will be used by our team to reassign these "undefined" Strongs IDs.

We need to talk about the Strongs Plus ID scheme. It may be better to re-write the xml file using this convention so that we can all be on the same page, We can even update the delivered A-S xml with the Strongs IDs (in 5-number form) for those that were undefined, which you have defined. I can then re-run my script for the new format and have the best of both worlds.

jonathanrobie commented 6 years ago

We should have a separate issue for extending Strong's. I will open one.

toddlprice commented 6 years ago

I'm fine with whatever you think would work best on this @destatez. Thanks for your work.