kfuku52 / csubst

Molecular convergence detection
BSD 3-Clause "New" or "Revised" License
25 stars 1 forks source link

interpreting results of csubst #44

Closed yimingweng closed 1 year ago

yimingweng commented 1 year ago

Dear Fukushima-san,

Thank you for the great tool. While I was learing this analysis with my data, I wasn't sure how to interpret the result correctly.

Any help will be greatly appreciated! Thank you. YiMing

kfuku52 commented 1 year ago

Thank you for attempting to use CSUBST. It appears that the trees were not attached as expected. Could you please share them with us?

yimingweng commented 1 year ago

Oh, my mistake. Here is the species tree (please note that species "Mleu" is from different family of "Msex", "Bmor", and "Aips" but they share same phenotype). species_tree tre

And here is the gene tree, the four species with same phenotype are clustered together with that phenotype in this gene that has known function related to this phenotype. gene_tree tre

kfuku52 commented 1 year ago

The Mleu and Bmor/Msex/Aips lineages were not detected due to their status as entirely sister groups within the gene tree, even though these genes actually did not actually share an evolutionary history. Their apparent grouping might be an artifact of branch attraction, which is a consequence of sequence similarity induced by convergence. Molecular convergence cannot be detected between immediate sisters, so the gene tree topology has to be fixed for a proper analysis. Could you consider using a tree where Mleu and Bmor/Msex/Aips are positioned distantly from each other as input for CSUBST, similar to the species tree?

Also, it is appropriate to include an outgroup, because one of your target lineages (Bmor/Msex/Aips) has its stem branch in the sub-root position, for which ancestral sequences are difficult to estimate.

yimingweng commented 1 year ago

Dear Fukushima-san,

Thank you so much for the quick and very helpful explanation. I thought the input tree for CSUBST has to be the gene tree of the focal gene? Can I instead using the species tree file as input to run CSUBST (and with outgroup being defined)?. And I wonder if I can still run this analysis if the shared phenotype among Mleu/Bmor/Msex/Aips is actually an ancestral state? About long-branch attraction, I am interested in running csubst site to see if convernt site can be detected. Thank you.

YiMing

kfuku52 commented 1 year ago

Gene trees are ideal as input, but in your particular case, it apparently does not work. You can perhaps use a species tree, or improve the gene tree topology by phylogeny reconciliation with the species tree using, e.g., GeneRax. If the focal phenotype is ancestral to the entire tree, running CSUBST may not be quite meaningful.

yimingweng commented 1 year ago

Dear Fukushima-san,

Thank you very much for the suggestions. I just added two outgroup species and used the rooted species tree to reconcile the gene tree using GeneRax. However, the topology around the the focal species doesn’t change (Mleu, the species of interest, is still sister group of Aips where entire clade is same phenotype but phylogenetic not related as shown in species tree). Of course in such case I can’t do csubst to test this gene on the convergence between Mleu and Aips but I just got another thought. (I hope this question is relevant to other users so that at least it could be somewhat helpful). gene tree here gene_tree_rotted

species tree here sp_tree_rooted

I was thinking a possibility that for Mleu, together with other species with same phenotype in the same gene cluster, are grouping together because they have remained ancestral state (i.e. evolutionary conservatism, because they are closer to the root, with shorter branch lengths), while for those species in the other clade (the clade, e.g. Ccro, Pmal, Tsyl, Dple, Hmel, Lcor, Lphl, Cnem, Cvir, Lcor, Lphl), they have more derived mutations possibly due to positive selection since their common ancestor. If this is the case, is there any way that we can test this idea, like testing positive selections on the clade with more derived mutations using ωC?

kfuku52 commented 1 year ago

If you aim to test for positive selection without specifically focusing on convergence, both HyPhy and PAML are suitable tools. If you're looking to test for convergence between two attracted lineages, using the species tree as an input for CSUBST remains a viable last resort.