jjmccollum / teiphy

A Python package for converting TEI XML collations to NEXUS, BEAST 2.7 XML, and other formats
MIT License
11 stars 3 forks source link

Anticipate encoding of ambiguous readings using `witDetail` #1

Closed jjmccollum closed 2 years ago

jjmccollum commented 2 years ago

Where one or more witnesses have a gap or a nonsense reading that could be disambiguated as more than one substantive reading, this situation should be encoded in a TEI-friendly way. The TEI Guidelines (https://www.tei-c.org/release/doc/tei-p5-doc/en/html/TC.html#TCAPWD) describe a witDetail element parallel to lem and rdg elements that would be suitable for this purpose: the element includes a wit attribute (for one or more witnesses described by its detail) and a target attribute (which can point to one or more readings that might disambiguate it). For example:

<app xml:id="B10K1V3U22-26">
    <lem><w>ο</w><w>ευλογησας</w><w>ημας</w></lem>
    <rdg xml:id="B10K1V3U22-26R1" wit="P46 01C1 02 03 06 010 012 018 020 025 044 056 075S 0142 0150 0151 0278 0319 1 6 18 33 35 38 61 69 81 88 93 102 104 177 181 203 218 263 296 322 326 330 337 363 365 383 398 424 436 442 451 459 462 467 506 606 629 636 665 915 1069 1108 1115 1127 1175 1240 1241 1245 1311 1319 1398 1505 1509 1573 1611 1617 1718 1729 1739 1751 1836 1837 1838 1851 1860 1877 1881 1886 1893 1908 1910 1912 1918 1939 1959C 1962 1963 1985 1987 1991 1996 1999 2004 2005 2008 2011 2012 2127 2138 2180 2243 2344 2352 2400 2464 2492 2495 2516 2523 2544 2576 2805 2865S L156 L169 L587 L809 L1159 L1178 L1188 L2058 syrh AthanasiusOfAlexandria CyrilOfJerusalem RP SBL TH TR Tisch Treg WH"><w>ο</w><w>ευλογησας</w><w>ημας</w></rdg>
    <rdg xml:id="B10K1V3U22-26R1-v1" type="reconstructed" wit="94"><w>ο</w><w>ευ<unclear>λ</unclear>ογησας</w><w>ημας</w></rdg>
    <rdg xml:id="B10K1V3U22-26R1-f1" type="defective" wit=""><w>ο</w><w>ευλογησης</w><w>ημας</w></rdg>
    <rdg xml:id="B10K1V3U22-26R1-f1-v1" type="reconstructed" wit="1959*"><w>ο</w><w>ευλογησ<unclear>η</unclear>ς</w><w>ημας</w></rdg>
    <rdg xml:id="B10K1V3U22-26R2" wit="664 1490 1678 1831 1840 L1440 L2010"><w>ο</w><w>ευλογησας</w><w>υμας</w></rdg>
    <rdg xml:id="B10K1V3U22-26R2-f1" type="defective" wit="1721"><w>ο</w><w>ευλογησας</w><w>υμιας</w></rdg>
    <rdg xml:id="B10K1V3U22-26R3" wit="01*"><w>ο</w><w>ευλογησας</w></rdg>
    <witDetail type="ambiguous" target="#B10K1V3U22-26R1 #B10K1V3U22-26R2" wit="256"><w>ο</w><w>ευλογη<gap unit=" word" extent="part" reason="lacuna"/></w><w><gap unit="word" extent="part" reason="lacuna"/>μας</w></witDetail>
</app>

While the pointers in the wit and target attributes should technically point to unique elements (which, within the XML collation document, would be xml:id values prefixed by the # character), in practice, we may assume the pointers to refer to n values (for witnesses or readings within the same app element) if they do not start with the # prefix. (This is especially convenient for New Testament textual critics, who use Gregory-Aland numbers to refer to manuscripts; XML guidelines prohibit xml:ids that begin with numbers.) So the following should also be supported (even if it is not strictly valid TEI):

<app xml:id="B10K1V3U22-26">
    <lem><w>ο</w><w>ευλογησας</w><w>ημας</w></lem>
    <rdg n="1" wit="P46 01C1 02 03 06 010 012 018 020 025 044 056 075S 0142 0150 0151 0278 0319 1 6 18 33 35 38 61 69 81 88 93 102 104 177 181 203 218 263 296 322 326 330 337 363 365 383 398 424 436 442 451 459 462 467 506 606 629 636 665 915 1069 1108 1115 1127 1175 1240 1241 1245 1311 1319 1398 1505 1509 1573 1611 1617 1718 1729 1739 1751 1836 1837 1838 1851 1860 1877 1881 1886 1893 1908 1910 1912 1918 1939 1959C 1962 1963 1985 1987 1991 1996 1999 2004 2005 2008 2011 2012 2127 2138 2180 2243 2344 2352 2400 2464 2492 2495 2516 2523 2544 2576 2805 2865S L156 L169 L587 L809 L1159 L1178 L1188 L2058 syrh AthanasiusOfAlexandria CyrilOfJerusalem RP SBL TH TR Tisch Treg WH"><w>ο</w><w>ευλογησας</w><w>ημας</w></rdg>
    <rdg n="1-v1" type="reconstructed" wit="94"><w>ο</w><w>ευ<unclear>λ</unclear>ογησας</w><w>ημας</w></rdg>
    <rdg n="1-f1" type="defective" wit=""><w>ο</w><w>ευλογησης</w><w>ημας</w></rdg>
    <rdg n="1-f1-v1" type="reconstructed" wit="1959*"><w>ο</w><w>ευλογησ<unclear>η</unclear>ς</w><w>ημας</w></rdg>
    <rdg n="2" wit="664 1490 1678 1831 1840 L1440 L2010"><w>ο</w><w>ευλογησας</w><w>υμας</w></rdg>
    <rdg n="2-f1" type="defective" wit="1721"><w>ο</w><w>ευλογησας</w><w>υμιας</w></rdg>
    <rdg n="3" wit="01*"><w>ο</w><w>ευλογησας</w></rdg>
    <witDetail type="ambiguous" target="1 2" wit="256"><w>ο</w><w>ευλογη<gap unit=" word" extent="part" reason="lacuna"/></w><w><gap unit="word" extent="part" reason="lacuna"/>μας</w></witDetail>
</app>
jjmccollum commented 2 years ago

In addition, per the TEI guidelines: "without a target attribute, [witDetail] refers to the closest preceding lem or rdg." So if a witDetail element has no target attribute, then its target should be set to the last lem or rdg element by default.

jjmccollum commented 2 years ago

All right, I have this feature implemented in tei_collation_converter.py, and it is working as expected with witDetail elements in the example XML file. I will now close this issue.