phyloacc / PhyloAcc

PhyloAcc a software to detect the changes of conservation of a genomic region
GNU General Public License v3.0
27 stars 12 forks source link

Trees for individual CNEs #55

Closed gaurav-agavekar closed 5 months ago

gaurav-agavekar commented 7 months ago

Hi,

I'm running PhyloAcc on a set of CNEs from ant genomes. The pipeline has been rather easy to execute, so thanks for the great interface and a user-friendly README.

I had one question: how/where do I get trees for individual accelerated CNEs? As far as I can tell PhyloAcc doesn't generate/store these in the results. Do you have any recommendations for how to generate them? I would like to visualize the accelerations.

Also a quick note that PhyloAcc doesn't warn/auto-correct if ancestral nodes are not named with tree_doctor, it just fails (but works fine when the .mod file is first processed with tree_doctor).

thanks, Gaurav

gwct commented 7 months ago

Hi Gaurav, Thanks! I'm glad you've found it easy to use!

If you just want to generate gene trees for particular elements, I would recommend IQ-Tree and then visualizing them in R with something like ggtree.

If you're asking specifically about gene trees generated by PhyloAcc, , I'm not sure the gene trees were ever meant to be part of the official output for PhyloAcc, and they've never been tested for accuracy. I'm also not exactly sure how the branch lengths for these trees are calculated. Maybe @xyz111131 or @HanY-H could weigh in?

But they are available in a very complicated way in the elem_Z files for each batch, which are unfortunately not that easy to access. Basically, for each locus, you look up which model (M0, M1, or M2) has the best marginal likelihood. Then in your phyloacc directory (the one you specified when using phyloacc.py), you would go to the path: phyloacc-job-files/phyloacc-output/[batch]-phyloacc-[gt/st]-out/[batch]_[model]_elem_Z.txt. Then, you'll have to look for the line with the ID for that locus from the current batch (NOT the overall id), and the gene tree will be the last column of that row.

And thanks for reporting this behavior about the un-labeled trees. I think we're aware of this and I'm working on several bug fixes that will get released soon. I'll tag this issue when we fix that one.

gwct commented 5 months ago

Checks for node labels in the trees have been added in v2.3.1.

gaurav-agavekar commented 4 months ago

@gwct sorry, this fell through the cracks. I was referring to the example individual CNE trees shown in the Sackton et al. 2019 paper with accelerated clades in red. I found a way to visualize those, so I forgot to reply here, but thanks anyway for your help.