jjmccollum / open-cbgm

Fast, compact, open-source, TEI-compliant C++ implementation of the Coherence-Based Genealogical Method
MIT License
29 stars 1 forks source link

Placement of <fs/> and <graph/> in input not valid TEI #1

Closed jjmccollum closed 4 years ago

jjmccollum commented 4 years ago

Presently, the populate_db.cpp script and the variation_unit class operate under the assumption that the <graph/> element containing the local stemma for a variation unit is a child element of the <app/> element representing the corresponding variation unit. According to the TEI guidelines for the <app/> element (https://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-app.html), an <app/> can only have children in a very limited set of elements (e.g., notes, lemma or variant readings, or witnesses), so this hierarchy is invalid. The same is true of the <fs/> subelement containing a variation unit's connectivity feature.

This can be resolved easily by requiring that the <graph/> and <fs/> elements be moved to a position parallel to the corresponding <app/> elements and that they have id or n attributes matching that of the <app/>. As long as these elements can be matched this way, where the <graph/> and <fs/> elements are placed in the XML tree is immaterial; they could appear directly after the <app/>, if this is convenient, or they could be relegated to a position past the end of the text.

This change in implementation would require code changes involving the XPath searches used in populate_db.cpp, modifications to the variation_unit and local_stemma constructors, and updates to the example XML files to reflect the new expected structure. Since this will make the library incompatible with earlier inputs, it should be handled on a dev branch and incorporated into the next major release; the code with the present input structure, while not strictly TEI-compliant, functions as expected.

jjmccollum commented 4 years ago

Alternatively, much of the current structure can be maintained in a TEI-compliant manner if the <fs/> and <graph/> elements are placed under a <note/> element under the <app/>. This may be the simplest and least error-prone approach moving forward.

jjmccollum commented 4 years ago

This change has been made on the dev branch; it will be merged with master at the next major release.