<gloss> versus <emph> - Githubissues

destatez commented 7 years ago

I may have missed it, but a "command decision" should be made as to when to use <gloss> versus <emph>. There seem to be 3 classes of instances of italicization in the A-S, with the 2nd being the most straight-forward use for <gloss>. An analysis function could be developed to analyze and report the instances of these classes.

Within a derivation clause (example uses <gloss>): <seg type="derivation">(< <foreign xml:lang="grc">συγκυρέω</foreign>, <gloss>to happen</gloss>), </seg>
Within a sense clause, particularly when immediately following that XML tag: <sense><gloss>chance, coincidence</gloss>: <foreign xml:lang="grc">κατὰ σ.</foreign> (v. MM, xxiii), <ref osisRef="Luke.10.31">Lk 10:31</ref> (Hippocr., Eccl.).†</sense>
Within a sense clause, but within parenthesis that identify either RV or AS as the source (first example uses <gloss>, second has neither): ... (RV,<gloss>exact wrongfully</gloss>;... or ...(AV, comforter;...

dowens76 commented 7 years ago

Are you talking about using <gloss> vs. plain italics? I think the code in your post did not come through.

The rule of thumb I use is that in any given entry, only glosses for that entry should be tagged as <gloss>. If it relates to etymology (as in #1 above), it should probably be in parentheses. The idea is that if someone wishes to extract all the glosses from a given entry, you don't want false positives.

destatez commented 7 years ago

What I think I hear you saying is that the first instance, above, is really a gloss for the derived-from word, not the entry's word, so it should be merely italicized. Below is the visualization of that entry where I have shown the first instance as italics (only), and the "real" gloss as bold-italics, using the markdown syntax to show that within this issue. Looking forward to some cleanup efforts in this regard, when the first pass of manual edits are complete, can we be assured that every italicized word, before the first <sense..> XML tag-pair, should be merely italicized, and not marked as a gloss?

What are your thoughts about the RV and AS entries. Do you consider those as real glosses, or should they be only italicized?

* συγκυρία, -ας, ἡ (< συγκυρέω, to happen), [in Sm.: I Ki 6:9 (מִקְרֶה) ;] (more freq. in late writers, συγκύρησις, -ημα), chance, coincidence: κατὰ σ. (v. MM, xxiii), Lk 10:31 (Hippocr., Eccl.).†

cbearden commented 7 years ago

The Github issues system permitted me to edit Dave's initial posting, so I added the backticks to make the <gloss> and <sense> tags show up (so Daniel, you aren't hallucinating).

I hope we can do justice both to the semantic structure of the original (e.g. differentiating between the definition of the headword and other italicized text, including Dave's #1 and #3 examples), and to its typeset rendering (ensuring that all italicized strings in the original can be rendered in italics by users of our markup).

I'm inclined to agree that we should treat the headword definitions differently than other glosses & definitions. If we're going to use the <gloss> element to mark up the headword definition, let's use it only for that, and find another way to mark up other meanings or definitions.

In case it's helpful, I have a list of parent elements of <gloss> and the count of occurrences of <gloss> with those parent elements:

parent element	count
sense	8339
seg	132
re	35
form	27
gloss	4
etym	3
gramGrp	1

Of the <seg> parents, 126 have the type derivation and 6 the type septuagint. I can give you a spreadsheet or a report in another format listing the <entry> elements and page numbers of all the occurrences of <gloss> that isn't a child of <sense>, if that would be helpful.

dowens76 commented 7 years ago

@destatez I think RV and AS entries should be tagged as italics, not <gloss>. @cbearden Your list is helpful. It seems that we should go through the <gloss> elements under anything but <sense> to determine the suitability of that markup. Probably anything under seg, re, form, etym, or gramGrp is not right. But the numbers are small, so it would be worth checking those manually.

cbearden commented 7 years ago

I created a gist that lists most of the <gloss> elements enclosed by parentheses, together with the page number and the entry headword in a comment immediately preceding. The results are in a series of <result> elements in a nonce namespace, to differentiate them from the TEI elements, which lack a prefix. Is this report helpful? Is there a way to make it more useful?

The limitation of the output is that it won't retrieve elements that are preceded by both an open and close parenthesis, or followed by both an open and close parenthesis. There's probably a way to modify the XQuery to handle that.

UPDATE: I added two more files to the gist, one that shows elements that aren't themselves <sense> but that contain one <gloss>, and another that shows elements that contain more than one <gloss>.

I can also create tabular/CSV output.

destatez commented 7 years ago

I guess I let this fall through the crack, even though I opened the issue, since I did not look at the gist. Is this something I should pursue before the "baseline"?

destatez commented 7 years ago

These lists are enormous and will take some time to review. For the first two gists, each instance will have to be evaluated on its own (though categories may be established to determine correctness. The 3rd gist is, in part, my instigation. I chose, for the most part, to separate a multiple entry gloss into its individual pieces. This will work fine as we move to Stage 2, where each gloss will be a separate item in the new files. I say we close this for Stage 1 and let the editors in Stage 2 add/remove glosses as they review the A-S data against other sources.

destatez commented 7 years ago

Let's let the Stage 2 editors clean up the glosses

translatable-exegetical-tools / Abbott-Smith

<gloss> versus <emph> #68