jjmccollum / teiphy

A Python package for converting TEI XML collations to NEXUS, BEAST 2.7 XML, and other formats
MIT License
11 stars 3 forks source link

Support filling corrector text based on previous witness's text #2

Closed jjmccollum closed 2 years ago

jjmccollum commented 2 years ago

Depending on the application, we may want to treat correctors (e.g., GA 424C) as fragmentary witnesses whose text is defined only where it appears and is "lacunose" otherwise, or we may want to treat them as fuller witnesses that assume their base witnesses' texts (or, for later correctors, the previous correctors' texts) where they do not explicitly have their own readings.

Ideally, this would be a command-line argument for the user to specify. On the TEI XML end, this rule could be encoded with additional witness entries in the listWit with special types (e.g., "corrector"), placed after their respective parents, as follows:

<witness n="424"/>
<witness type="corrector" n="424C1"/>
<witness type="corrector" n="424C2"/>

Then, when we generate the collation matrix, we could optionally match the corrector witness's reading to the previous witness's reading whenever the corrector witness has no reading(s) of its own.

jjmccollum commented 2 years ago

All right, this feature is now covered by the fill_corrector_lacunae member of the collation class and the --fill-correctors flag in the convert_tei.py executable script, and it appears to be working on the example collation so far. I'll consider this issue closed.