Closed wsdewitt closed 7 years ago
Note that https://pythonhosted.org/DendroPy/programs/sumtrees.html is one way in which we could collapse equally-parsimonious trees.
@matsen I'm currently choosing the "less thorough" and "rearrange on one best tree" options in dnapars (options S), but I can look at sumtrees after prototyping with this. Do you think it's weird to combine ML for pruning (fasttree) with a parsimony tree?
The whole thing's a little nuts, but no I don't think that in the large scheme of things this is an inherent problem. Do you know how dnapars scales with # sequences?
Not sure, but I could play with that. My first thought was that it would be worse than dnaml. Since it's searching multifurcating trees, the space is bigger?
I currently have a branch for this issue that is generating parsimony trees instead of ML trees. I have it up at http://stoat:5000. The svg trees look fine, but ascii not so much. I think this is because @cwarth's heavy wizardry for getting the ascii nodes to align with the sequence alignment assumes we have a binary tree. @metasoarous, are you familiar with that part of the code? It's not obvious to me that it will be easy to make the alignment work with multifurcating trees like these. Odd-furcations seem especially problematic, since the parent and one child will be aligned vertically.
This seems like it will require nontrivial effort.
@lauranoges , do you value the tree+alignment as it stands, or as it would be if we were to make it work for the parsimony tree?
An alternative I have been thinking about is something by which you could click on the tips of the tree (perhaps just check boxes next to the taxon names) to select a subset of the tips, and then download a FASTA file with all of those sequences plus the ancestral sequences on paths from the root to these tips. This would require some javascript to figure out which ancestral sequences are needed then pull those sequences out and throw them into a special download link.
@lauranoges and @metasoarous and I discussed a possible alternative to parsimony is to collapse the zero-length branches in dnaml trees, removing repeated in internal nodes and giving us multifurcating trees. @matsen, do you have opinions on this?
Are there zero-length branch lengths in the dnaml trees?
I have a feeling the tree branch lengths in question are nonzero, so we may have to compare sequence identity to determine whether branches should be collapsed.
For reference @matsen, we were wondering about this because a question came up about whether the ancestral reconstructions from dnaml might be more trustworthy than those from parsimony. Bri is looking at the tips of the trees, so for her the x axis thing is more important. But Megan is looking at the internal nodes and reconstructing those ancestral states, so for her having good reconstructions takes higher priority.
Yes, e.g. /home/matsengrp/working/csmall/cft/output/QB850.430-Vk/Hs-LN4-5RACE-IgK-100k/run-viterbi-best-plus-1/outfile
@WSDeWitt @matsen I stand corrected! That certainly makes things easier.
The models we're using for likelihood are not B-cell specific, so I wouldn't expect it to be terribly different. Hopefully team motif will come save us eventually.
Re accuracy, @krdav and I are planning a simulation study to answer exactly these sorts of questions, so stay tuned.
@lauranoges I realized that part of the problem we have here with the existing SVG trees is that the SVG rendering utility inserts horizontal space in the tree so that it can place the node labels to the right of the nodes without overlapping much with the branches.
The correct way to read the horizontal distance here is to look at the horizontal lenght of the lines leading from the "fork" to the blue dot. When interpreted this way, you can see that most of the branch lengths below are very small. You could imagine that if you took out the node labels, the most extreme of these tips wouldn't stick out nearly as much as they do with the labels in place.
I think we're still planning on doing parsimony (at least until the motif team comes to the rescue) to make things easier to quickly grok visually. But in the mean time, being aware of the cause of this issue should be helpful (I hope).
Yes, we noticed this too and I'm glad you so elegantly put it into words. Thank you @metasoarous
@WSDeWitt I understand this is running on 5000
; Is it ready to merge to master, or does it need some more work?
The parsimony analysis seems to work ok, but the ASCII alignment had been borked due to multifurcation. Maybe it's ok now that you have clickable SVG! I should probably pull from master in the parsimony branch and test this.
That sounds perfect :-)
closed by PR #136
We'd like to add the option to use phylip's parsimony program dnapars instead of maximum likelihood dnaml, because this allows for non-bifurcating trees where we can better map associate along the tree with the amount of mutation. We'll need to parse
outfile
differently, particularly the ancestral state parts, but code from @matsengrp/bcell's GCtree can be reused.