lmaurits / phyltr

Unix filters for manipulating and analysing (samples of) phylogenetic trees represented in the Newick format
GNU General Public License v3.0
3 stars 4 forks source link

phyltr cat does not read NEXUS translate block correctly #20

Closed xrotwang closed 5 years ago

xrotwang commented 5 years ago

Just noticed this:

$ phyltr cat dplace-data/phylogenies/hruschka_et_al2015/posterior.trees | head -n1
(((SJG:1459.51,((ALT:1117.7,(HAK:645.174,SHR:645.174)1:472.524)1:124.182,((TOF:209.877,TUV:209.877)1:571.449,(DOLG:218.945,JAK:218.945)1:562.381)1:460.554)1:217.634)1:401.68,(((((SAL:402.523,TRM:402.523)1:723.17,(KHAL:954.16,(AZB:776.895,(GAGX:361.098,TRK:361.098)1:415.798)1:177.265)1:171.533)1:259.542,KRMX:1385.24)1:190.817,(UZB:716.856,UIG:716.856)1:859.196)1:69.633,(((KRG:628.961,(NOGX:438.589,(KAZ:159.022,KLPX:159.022)1:279.567)1:190.372)1:186.088,(BAS:378.152,TAT:378.152)1:436.897)1:186.247,(QUM_:453.077,BLKX:453.077)1:548.219)1:644.39)1:215.508)1:533.183,CHV:2394.38)1:0;

Note that 25 is translated to QUM_, not QUM. This seems to happen, because QUM is the last element in the translate list and terminated with ; not ,. Inserting an additional element fixes the problem:

$ git diff -U0 posterior.trees
diff --git a/phylogenies/hruschka_et_al2015/posterior.trees b/phylogenies/hruschka_et_al2015/posterior.trees
index 1eb473e..4eca81f 100644
--- a/phylogenies/hruschka_et_al2015/posterior.trees
+++ b/phylogenies/hruschka_et_al2015/posterior.trees
@@ -29 +29,2 @@ begin trees;
-               25 QUM;
+               25 QUM,
+               26 ZZZ;

results in

(((SJG:1459.51,((ALT:1117.7,(HAK:645.174,SHR:645.174)1:472.524)1:124.182,((TOF:209.877,TUV:209.877)1:571.449,(DOLG:218.945,JAK:218.945)1:562.381)1:460.554)1:217.634)1:401.68,(((((SAL:402.523,TRM:402.523)1:723.17,(KHAL:954.16,(AZB:776.895,(GAGX:361.098,TRK:361.098)1:415.798)1:177.265)1:171.533)1:259.542,KRMX:1385.24)1:190.817,(UZB:716.856,UIG:716.856)1:859.196)1:69.633,(((KRG:628.961,(NOGX:438.589,(KAZ:159.022,KLPX:159.022)1:279.567)1:190.372)1:186.088,(BAS:378.152,TAT:378.152)1:436.897)1:186.247,(QUM:453.077,BLKX:453.077)1:548.219)1:644.39)1:215.508)1:533.183,CHV:2394.38)1:0;
xrotwang commented 5 years ago

Ah, it's a classic: In https://github.com/lmaurits/phyltr/blob/39bc8962ab2d8df5646536d76a15c7828f3649bd/src/phyltr/plumbing/sources.py#L125-L126 you first check for the last non-whitespace character in line, and then remove the last character unconditionally, i.e. without calling .strip() first. Easy fix.

lmaurits commented 5 years ago

Thanks for catching this! Should be fixed now in develop?

xrotwang commented 5 years ago

All good now. Btw. what about doing away with the develop branch? Would make for a simpler development model, I think.

lmaurits commented 5 years ago

Yeah, I am pretty open to this. A lot of my projects (e.g. BEASTling too) still have this overly-complicated branch structure I cargo-culted from that popular git blog post, which I've really only found to be a pain, over the years.

I would be very happy if a phyltr 1.0.0 could finally emerge as part of the upcoming D-PLACE hackathon in Jena, and part of that could be simplifying the repo structure.

xrotwang commented 5 years ago

Yes, a nice, well-tested 1.0 with simple maintenance/development setup would be a good side-effect! One desideratum I'm pondering, is better python API usability - but from first inspection it seems as if the stream handling (including sys.* access) is baked into the core Command class - so may not be easily replaced with other generators of trees.

xrotwang commented 5 years ago

Maybe we (i.e. you :) ) should add a 1.0 milestone, to get things organized.