Uniqueness of IDs not necessary

matsengrp / gctree

GCtree: phylogenetic inference of genotype-collapsed trees

GNU General Public License v3.0

16 stars 2 forks source link

In case of id-as-abundances, this was a proper bug: if you had two nodes with the same abundance, it would crash. But it extends also to normal IDs: since it's the actual sequence that determines uniqueness (key of dictionary), the sequence name can be ambiguous. This is useful if, for example, I want the labels in the tree to show the amino acid CDR3, which obviously will be the same for many nodes.

One additional improvement would be to parse a header like ">seqname abundance=5", to have the best of both worlds: meaningful sequence IDs and their abundances.

NB: this also includes a small fix in phylip_parse whereby the script would always crash, because it requires the output file to be open in binary mode.

matsengrp / gctree

Uniqueness of IDs not necessary #60