yatisht / usher

Ultrafast Sample Placement on Existing Trees
MIT License
122 stars 41 forks source link

matUtils extract results in Segmentation Fault #245

Closed kvargha closed 2 years ago

kvargha commented 2 years ago

Here are the files that are leading to a segmentation fault

Command that causes error: matUtils extract -i new_tree.pb -s samples.txt -Y 5 -j output.json

Causation Explanation From @AngieHinrichs : It's line 1532 of mutation_annotated_tree.cpp (in get_subtree): if (curr->clade_annotations[k] != "") { Every node's clade_annotations array is expected to have the same size as the root node's clade_annotations array, with all items initialized to "" if they don't have an actual annotation on that node. But sometimes some of them end up missing instead of "".

jmcbroome commented 2 years ago

The simple solution is just to iterate to the size of the annotations vector of the node being copied, instead of the number of annotations on the root. Since the new node having its information copied from the current node already is initialized with a vector of "" equal to the root length for its annotations, it will just not copy anything and leave it at the default ["",""] if the current node is somehow missing annotations, or is short an annotation.

I'll note that the implementation may break again if somehow a descendant node has MORE annotations than the root. This is because nodes are initialized with a fixed number of empty annotations, instead of an empty vector, and we never add to or subtract from the vector (really, the vector is not even really the right data structure, just an array would serve for having a static number of annotations assigned to every node). Trying to set the third element of the an annotation vector when the root only has two and two were assigned to the new node is problematic. However, this doesn't seem to be an issue that's come up, and the simple fix resolves the case where the node has less annotations than the root.

I've added this to my fork. I've attached the output JSON that results from my fixed implementation. It'll be included on my next PR.

output.json.gz

kvargha commented 2 years ago

Thank you for looking into this @jmcbroome. I really appreciate it!