kfuku52 / csubst

Molecular convergence detection
BSD 3-Clause "New" or "Revised" License
25 stars 1 forks source link

How to find which gene is selected #45

Closed YecLab closed 1 year ago

YecLab commented 1 year ago

Dear Fukushima-san, Many thanks for your help, I have gotten my csubst analyze results from my orthogroups. I have the following questions:

  1. why some of my omegaCany2spe is quite large, some of them larger than 10,000, some of them showed "Inf", are these results convincible? Do I need to delete these parts of the results?
  2. Does the Node number shown in iqtree file represent the "dist_node_num" in csubst_cb_k file? I have marked several foreground genes, how can I know which of them were selected?
  3. I am wondering about the stability of the results, will I get the same convergent selected gene by running the same input data and code? (I am asking this because I found some tools can not get the same positive selected genes when I repeat the same code and input file, only 65% overlap genes can be found in three replications of branch site model)

Thanks again for your help!

kfuku52 commented 1 year ago
  1. why some of my omegaCany2spe is quite large, some of them larger than 10,000, some of them showed "Inf", are these results convincible? Do I need to delete these parts of the results?

Please refer to the discussion at https://github.com/kfuku52/csubst/issues/34.

  1. Does the Node number shown in iqtree file represent the "dist_node_num" in csubst_cb_k file? I have marked several foreground genes, how can I know which of them were selected?

dist_node_num is the number of nodes separating the two (or more) branches. I guess your are looking for branch_id, which is visualized in csubst_branch_category.pdf. In a particular example below, branch_id 23 and 51 were specified as foreground. csubst_branch_category.pdf

  1. I am wondering about the stability of the results, will I get the same convergent selected gene by running the same input data and code? (I am asking this because I found some tools can not get the same positive selected genes when I repeat the same code and input file, only 65% overlap genes can be found in three replications of branch site model)

Yes, as long as you use the same versions of CSUBST and internally used packages such as numpy.

kfuku52 commented 1 year ago

branch_id is also indicated in csubst_tree.nwk. In this example, the branch leading to Astyanax_mexicanus_ENSAMXG00000007127 has a branch_id of 1.

(((Astyanax_mexicanus_ENSAMXG00000007127|1:0.208414,Danio_rerio_ENSDARG00000054191|7:0.305092)Node5|8:0.122719,(Gadus_morhua_ENSGMOG00000013283|9:0.389794,(Oreochromis_niloticus_ENSONIG00000017337|31:0.222438,Oryzias_latipes_ENSORLG00000014063|36:0.208828)Node7|37:0.113801)Node6|38:0.18523)Node4|39:0.198503,((((((((((Bos_taurus_ENSBTAG00000000894|2:0.014336,Ovis_aries_ENSOARG00000018803|41:0.0165996)Node16|42:0.0555491,Sus_scrofa_ENSSSCG00000012440|52:0.0752466)Node15|53:0.0119832,Canis_lupus_ENSCAFG00000017270|5:0.0970313)Node14|54:0.0273059,Chinchilla_lanigera_ENSCLAG00000000146|6:0.134263)Node13|55:0.00361894,(Oryctolagus_cuniculus_ENSOCUG00000014726|33:7.0745e-06,Oryctolagus_cuniculus_ENSOCUG00000025100|34:0.0986973)Node17|35:0.105074)Node12|56:0.00335731,(Callithrix_jacchus_ENSCJAG00000021082|3:0.042176,(Homo_sapiens_ENSG00000102144|12:0.0113642,Macaca_mulatta_ENSMMUG00000013725|17:0.0514254)Node19|18:0.00817766)Node18|19:0.0388127)Node11|57:0.0189132,(Mus_musculus_ENSMUSG00000062070|30:0.0377222,Rattus_norvegicus_ENSRNOG00000058249|47:0.0581334)Node20|48:0.110788)Node10|58:0.0778962,(((Callithrix_jacchus_ENSCJAG00000022394|4:0.0672672,(Homo_sapiens_ENSG00000170950|13:0.0353076,Macaca_mulatta_ENSMMUG00000011998|14:0.0362134)Node24|15:0.018907)Node23|16:0.0719251,((Mus_musculus_ENSMUSG00000031233|29:0.141063,Rattus_norvegicus_ENSRNOG00000013600|43:0.100332)Node26|44:0.278494,Oryctolagus_cuniculus_ENSOCUG00000005270|32:0.21085)Node25|45:0.0207272)Node22|46:0.0460494,(Ovis_aries_ENSOARG00000008736|40:0.134335,Sus_scrofa_ENSSSCG00000001738|49:0.166712)Node27|50:0.07244)Node21|51:0.241451)Node9|59:0.107233,((Monodelphis_domestica_ENSMODG00000004055|20:0.0263943,((Monodelphis_domestica_ENSMODG00000022821|21:0.00495363,Monodelphis_domestica_ENSMODG00000025017|24:0.00594229)Node31|25:0.0395574,Monodelphis_domestica_ENSMODG00000023304|22:0.12849)Node30|26:0.0107613)Node29|27:0.17098,Monodelphis_domestica_ENSMODG00000023385|23:0.491768)Node28|28:0.0845341)Node8|60:0.160317,(Xenopus_tropicalis_ENSXETG00000007447|61:0.588113,(Gallus_gallus_ENSGALG00000007936|10:0.386238,Anolis_carolinensis_ENSACAG00000003255|0:0.41102)Node1|11:0.0774267)Node2|62:0.085329)Node3|63:0.0964284);
YecLab commented 1 year ago

I got it, many thanks!!!