scottkleinman / aeme

AEME Development Repo
1 stars 2 forks source link

Named entity markup, 1 name multiple entities #50

Closed scottkleinman closed 8 years ago

scottkleinman commented 9 years ago

Consider the following (slightly edited) lines:

    <l>[Seinte Anne] hadde euerech [housebonde] aftur oþur ; for heo was iwedded þrie</l>
    <l>Bi euerech of heom ane douȝter heo hadde ; and euerech hieȝte <name type="person" 
         ref="???">Marie</name></l>

How would we handle the @ref, given that the name here refers to three separate entities?

Side question: What was Anne thinking?

dorothyk98 commented 9 years ago

Bwahahaha.

mmwwah commented 8 years ago

I'm reanimating this issue because I have a related one, and because I have a suggestion about the above.

Suggestion: How about declaring something like #bioMaryDaughterofAnne? The text doesn't name the three fathers, does it, so that we have no way of distinguishing between those three daughters. And any future references to them (if any) in the text may echo that ambiguity. Another possibility would be #bioMaryDaughterofAnne01, ...02, and ...03. [I'm not addressing the probable need for nesting bio-entities.]

Related issue: The Laud 108 named-entities sample that I looked at (now I'm forgetting where) declares #bioAugustine0001 as referring to EITHER Hippo or Canterbury. Surely not? I am needing Augie myself for Junius 1, so I have declared Aug0001 to refer to Hippo only. If I need Canterbury (and I don't think I will), I plan to call him 0002. Is that okay with everyone?

mmwwah commented 8 years ago

I found my notes -- the blended Augustine declaration was in flyleaf-folio-iii-recto.xml.

#bioAugustine001: Saint Augustine of Hippo or Canterbury (Auestac is read as Austyn)

mmwwah commented 8 years ago

And to follow up: The Guidelines say that bio declarations are the first two letters of the surname (the example is #bioGO001). Is that only for us, whereas our topics of study (Aug, Orm, St. Anne, etc.) get a full name?

scottkleinman commented 8 years ago

At the time the Guidelines were written, I was thinking primarily of names of editors for the first two letters (just following a practice started by Sharon). I relaxed this for names within the texts but kept the "001"--not for any good reason. For places, I reduced it to "01". Generally, I have made the rest of the id the full name (e.g. Augustine0001), sometimes with extra material like AugustineOfCanterbury0001 if I felt it necessary. But in the case of that particular name, I just used Augustine0001. The @xml:id has to be unique within the file, not across the whole corpus. Technically, it should be defined in the <teiHeader> so that you know who the id refers to, but I have just done that in comments at the top of the file (see Edmund as an example). When all the files are combined, these can be consolidated and adjusted so that Augustine0001 always refers to Augustine of Canterbury and Augustine0002 to Augustine of Hippo (or vice versa). We'll then have to do a second consolidation of Laud Misc. 108 with Junius 1 to begin building a system of ids for the corpus as a whole. Editions of manuscripts in the future can then use ids assigned for the corpus, simplifying the process.

I believe that we are told who the fathers of the different Marys are, and I think I used that information in the end.

mmwwah commented 8 years ago

Okay, that's cool. Closing.