Closed jjmccollum closed 2 years ago
Here's the relevant portion of code in textual_flow::coherence_in_attestations_to_dot
:
//Then add a secondary set of vertices for the primary ancestors of these witnesses with a different reading,
//along with edges indicating changes in reading:
unordered_set<string> secondary_set = unordered_set<string>();
unordered_set<string> processed_destinations = unordered_set<string>();
for (textual_flow_edge e : edges) {
//Get the endpoints' IDs:
string ancestor_id = e.ancestor;
string descendant_id = e.descendant;
//If the descendant is not in the primary vertex set, or the ancestor is, then skip:
if (primary_set.find(descendant_id) == primary_set.end() || primary_set.find(ancestor_id) != primary_set.end()) {
continue;
}
//Otherwise, we have an ancestor with a reading other than the specified one;
//skip it if we've added it already:
if (secondary_set.find(ancestor_id) != secondary_set.end()) {
continue;
}
//If the destination already has an edge drawn to it, then skip any secondary ancestors:
if (processed_destinations.find(descendant_id) != processed_destinations.end()) {
continue;
}
//Otherwise, add the destination node's ID to the processed set:
processed_destinations.insert(descendant_id);
//If it's new, then add a vertex for it:
int ancestor_ind = id_to_index.at(ancestor_id);
textual_flow_vertex v = indexed_vertices[ancestor_ind];
string ancestor_rdg = v.rdg;
out << "\t\t" << ancestor_ind;
out << " [label=\"" << ancestor_id << " (" << ancestor_rdg << ")\", color=blue, style=dashed]";
out << ";\n";
secondary_set.insert(ancestor_id);
}
For witnesses like 104 and 459 (i.e., the highest-priority witnesses to the reading they have here), the constructor for the textual_flow
class should add one textual flow ancestry edge for each closest potential ancestor with a distinct reading within the connectivity limit. So for both of them, the edge from 049 (reading a) should be added first, and 1827 (reading c) should be added second. In theory, then, the loop above should encounter the edge (049, 104) before it encounters (1827, 104), and the edge (049, 459) before it encounters (1827, 459). I'll need to add some checks to see if something else is in fact happening.
Okay, I think I've resolved the issue. I wasn't marking the descendant nodes as processed when their ancestors with a different reading had already been added the secondary set. The following updated code ensures that the superfluous ancestors are not added:
//Initially filter out any vertices that do not have the given reading:
list<textual_flow_vertex> coherence_in_attestations_vertices = list<textual_flow_vertex>(vertices);
coherence_in_attestations_vertices.remove_if([&](const textual_flow_vertex & v) {
return v.rdg != rdg;
});
//Add all of the graph edges ending at one of the remaining vertices, except for secondary graph edges for changes:
unordered_set<string> witnesses_with_rdg = unordered_set<string>();
for (textual_flow_vertex v : coherence_in_attestations_vertices) {
witnesses_with_rdg.insert(v.id);
}
list<textual_flow_edge> edges_with_descendants_with_rdg = list<textual_flow_edge>(edges);
edges_with_descendants_with_rdg.remove_if([&](const textual_flow_edge & e) {
return witnesses_with_rdg.find(e.descendant) == witnesses_with_rdg.end();
});
//Then add the secondary graph edges, taking only the first one for each descendant:
unordered_set<string> processed_descendants = unordered_set<string>();
list<textual_flow_edge> distinct_descendant_edges = list<textual_flow_edge>();
for (textual_flow_edge e : edges_with_descendants_with_rdg) {
if (processed_descendants.find(e.descendant) != processed_descendants.end()) {
continue;
}
processed_descendants.insert(e.descendant);
distinct_descendant_edges.push_back(e);
}
list<textual_flow_edge> coherence_in_attestations_edges = list<textual_flow_edge>(distinct_descendant_edges);
//Add any ancestors on these edges that do not have the given reading:
unordered_set<string> ancestors_without_rdg = unordered_set<string>();
for (textual_flow_edge e : distinct_descendant_edges) {
if (witnesses_with_rdg.find(e.ancestor) == witnesses_with_rdg.end()) {
ancestors_without_rdg.insert(e.ancestor);
}
}
for (textual_flow_vertex v : vertices) {
if (ancestors_without_rdg.find(v.id) != ancestors_without_rdg.end()) {
coherence_in_attestations_vertices.push_back(v);
}
}
These changes are reflected in the latest push.
While preparing my 2022 Greek Paul Project Webinar talk on the CBGM, I noticed that some coherence in attestations diagrams feature superfluous textual flow ancestors outside of the attestation of the specified reading:
To reproduce this issue:
3_john_collation_complete.xml
file from theopen-cbgm/examples
directory into your active directory.populate_db.exe -t 100 -z ambiguous -Z lac 3_john_collation_complete.xml cache.db
.print_textual_flow.exe --attestations cache.db B25K1V5U4
.dot -Tpdf -o attestations\B25K1V5U4Rb-coherence-attestations.pdf attestations\B25K1V5U4Rb-coherence-attestations.dot
.I suspect this has something to do with the way that the
textual_flow
class stores multiple textual flow ancestors with different readings for the same witness. This is intended only to be used in the coherence in variant readings diagram (in which all changes of reading within the connectivity limit are highlighted), but I must have failed to add a check to ensure that this doesn't happen in thetextual_flow::coherence_in_attestations_to_dot
method. Invokingand
reveals that the superfluous ancestor 1827 is indeed within the connectivity limit for both 104 and 459.
Naturally, this problem only arises in variation units with more than one variant reading. This issue can also be observed in the following units: