simonhmartin / twisst

Topology weighting by iterative sampling of sub-trees
GNU General Public License v3.0
70 stars 18 forks source link

Viewing Rooted Topologies in R Code #44

Open jbernst opened 11 months ago

jbernst commented 11 months ago

Hi Simon,

I am currently using Twisst (which seems like a fantastic program!) to look at the weighted topologies of trees across chromosomes in my study group. Including an outgroup, there are ~6-8 groups (depends on how you view it, but there is definitely hybridization) and 3 ingroup species. For simplicity and for the sake of just getting the code to work, I am running Twisst with 4 groups: Outgroup, C, V, O. I successfully ran Twisst on one chromosome, which included a set of 961 bootstrapped gene trees from RAxML-NG.

Upon running this in the R code for visualization, I noticed that the output trees are unrooted. I know this is expected output from Twisst and in the publication it mentions that your trees from the weights.tsv.gz file are unrooted, but rooted for the sake of visualization in the figure (attached below is our data with the C, V, O, and Outgroup showing unrooted topologies). Screenshot 2023-09-29 at 3 42 04 PM

I noticed in the downloadable weights.tsv.gz from Github, though, you seem to have rooted trees for looking at the weights. How did you get rooted trees from your analysis? We are pretty certain we know the 'true' topology of these organisms, and it would be helpful to know if there is a way to visualize at least the outgroup as an outgroup in our graphs. Screenshot 2023-09-29 at 3 42 15 PM

I also am wondering, is the interpretation the same? If you look at the most weighted topology in the image I attached of our own data run, despite it showing an unrooted tree, can I still interpret this as the most abundant topology being (Outgroup,(S,(V,O)))?

Also, I have one other question, which is a bit more basic on how Twisst works. In the image below, it shows a distribution of weights for the three topologies when I have gene trees with 4 groups. How does Twisst calculate multiple numbers for a single position (which would be a gene tree). I ran Twisst on a single gene tree with 4 groups expecting to get a weight distribution of 1,0,0 since there is only one tree topology for a given gene tree (instead I saw numbers that showed multiple topologies for a single gene tree) . But since Twisst works on subtrees, I think I am misinterpreting how the algorithm works and how to interpret the results. How do we get 2-3 numbers as weights for each gene tree provided?

Screenshot 2023-09-29 at 3 50 04 PM

If it helps, here is the code we ran:

python twisst.py -t iqtree.genetrees.tre.gz -w output.weights.csv.gz --outputTopos \ output.topologies.trees \
     --method complete \
    -g S \
    -g V \
    -g O \
    -g Outgroup \
    --groupsFile group-file.txt

Here is the groupFile we made:

266_og  Outgroup
261_og  Outgroup
263_r   S
262_r   S
267_r   S
269_r   S
263_r   S
267_r   S
268_r   S
266_r   S
268_r   V
268_o   V
268_o   V
261_o   V
265_o   V
266_o   V
266_w   O
265_w   O
263_w   O
265_w   O
261_w   O
260_w   O
267_w   O
263_w   O
261_w   O
269_w   O
266_w   O
266_w   O
267_w   O
264_w   O
263_w   O
264_w   O
262_w   O
266_w   O

Thank you so much! This program is looking really useful for this project, and I am looking forward to understanding it better!

simonhmartin commented 10 months ago

Hi Justin, sorry for the delay. If you include --outgroup Outgroup in your twisst command, it should work. Note you still need to include -g Outgroup.

simonhmartin commented 10 months ago

Just to add, the outgroup can have any name. For example -g A -g B -g C -g D --outgroup D.

simonhmartin commented 5 months ago

I see I never responded to this question "How do we get 2-3 numbers as weights for each gene tree provided?". Do you mean how can the weights be between 0 and 1 for each topology for a single gene tree? If so, you will need to read the original paper to see how the weighting works.