jjmccollum / open-cbgm

Fast, compact, open-source, TEI-compliant C++ implementation of the Coherence-Based Genealogical Method
MIT License
29 stars 1 forks source link

Implement support for alternate rdg IDs and witDetail elements in variation_unit.cpp #8

Closed jjmccollum closed 2 years ago

jjmccollum commented 2 years ago

To ensure that TEI XML inputs to teiphy can also be used with open-cbgm, the parsing of rdg elements in variation_unit.cpp should be made more flexible and extended to witDetail elements, as well. This should require only a few minor changes to the loop through the readings in the constructor:

for (xml_node rdg : xml.children("rdg")) {
    //Get the reading's ID and its text:
    string rdg_id = rdg.attribute("n").value();
    string rdg_text = rdg.text() ? rdg.text().get() : "";

should be changed to

for (xml_node rdg : xml.children()) {
    //If this element is not a <rdg/> or <witDetail/> element, then skip it:
    if (rdg.name() != "rdg" && rdg.name() != "witDetail") {
        continue;
    }
    //Otherwise, get the reading's ID and its text:
    string rdg_id = rdg.attribute("xml:id") ? rdg.attribute("xml:id").value() : (rdg.attribute("id") ? rdg.attribute("id").value() : (rdg.attribute("n") ? rdg.attribute("n").value() : ""));
    string rdg_text = rdg.text() ? rdg.text().get() : "";

To ensure matches between the children of the app element and the nodes in the local stemma, both rdg and witDetail elements will still be expected to have IDs in the form of @xml:id, @id, or @n attributes.

jjmccollum commented 2 years ago

The changed proposed here is pretty much correct, except you apparently need to use strcmp with the output of rdg.name():

if (strcmp(rdg.name(), "rdg") != 0 && strcmp(rdg.name(), "witDetail") != 0) {
    continue;
}

This is now passing unit tests on dev, so I'll consider this issue resolved.