Closed lfp-a closed 9 months ago
Below is my species tree (Calceolaria_pinifolia,((Titanotrichum_oldhamii_042,Gesneria_cuneifolia_074),(Corallodiscus_lanuginosus_014,Rhynchoglossum_obliquum_123)));
Hi, you forgot to provide cds files as arguments for wgd ksd
. In addition, the spair
option should be given like this "Corallodiscus_lanuginosus_014.cds;Titanotrichum_oldhamii_042.cds"
, using the cds file names instead of absolute path.
Thank you, but now I have this error
14:08:49 INFO Analysing family GF00003078 core.py:2873 14:08:50 INFO Analysing family GF00003079 core.py:2873 14:08:50 INFO Analysing family GF00003080 core.py:2873 14:08:50 INFO Analysing family GF00003081 core.py:2873 14:08:52 INFO Analysing family GF00003082 core.py:2873 14:09:05 INFO Saving to wgd_globalmrbh_ks/global_MRBH.tsv.ks.tsv cli.py:493 14:09:06 INFO Making plots cli.py:495 INFO Implementing node-averaged Ks analysis viz.py:511 14:09:09 INFO Recalculating the weights per species pair viz.py:567 INFO Plotting kde curve over histogram viz.py:569 INFO The corrected mode of species pair viz.py:636 Corallodiscus_lanuginosus_014.cds__Titanotrichum_ol dhamii_042.cds is 0.91 14:09:10 INFO The mode of species pair viz.py:641 Corallodiscus_lanuginosus_014.cds__Titanotrichum_ol dhamii_042.cds is 0.534 INFO The corrected mode of species pair viz.py:636 Corallodiscus_lanuginosus_014.cds__Gesneria_cuneifo lia_074.cds is 0.92 INFO The mode of species pair viz.py:641 Corallodiscus_lanuginosus_014.cds__Gesneria_cuneifo lia_074.cds is 0.587 INFO The corrected mode of species pair viz.py:636 Corallodiscus_lanuginosus_014.cds__Rhynchoglossum_o bliquum_123.cds is 0.81 INFO The mode of species pair viz.py:641 Corallodiscus_lanuginosus_014.cds__Rhynchoglossum_o bliquum_123.cds is 1.045 14:09:11 INFO The mode of species pair viz.py:641 Calceolaria_pinifolia.cds__Corallodiscus_lanuginosu s_014.cds is 1.073 Traceback (most recent call last): File "/data/wanglab/liufangpu/miniconda3/envs/wgd2/bin/wgd", line 8, in <module> sys.exit(cli()) File "/data/wanglab/liufangpu/miniconda3/envs/wgd2/lib/python3.8/site-packages/click/core.py", line 829, in __call__ return self.main(*args, **kwargs) File "/data/wanglab/liufangpu/miniconda3/envs/wgd2/lib/python3.8/site-packages/click/core.py", line 782, in main rv = self.invoke(ctx) File "/data/wanglab/liufangpu/miniconda3/envs/wgd2/lib/python3.8/site-packages/click/core.py", line 1259, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/data/wanglab/liufangpu/miniconda3/envs/wgd2/lib/python3.8/site-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, **ctx.params) File "/data/wanglab/liufangpu/miniconda3/envs/wgd2/lib/python3.8/site-packages/click/core.py", line 610, in invoke return callback(*args, **kwargs) File "/data/wanglab/liufangpu/miniconda3/envs/wgd2/lib/python3.8/site-packages/cli.py", line 464, in ksd _ksd(**kwargs) File "/data/wanglab/liufangpu/miniconda3/envs/wgd2/lib/python3.8/site-packages/cli.py", line 506, in _ksd multi_sp_plot(df,spair,spgenemap,outdir,onlyrootout,title=prefix,ylabel=ylabel,ksd=True,reweight=reweight,sptree=speciestree,extraparanomeks=extraparanomeks, ap = anchorpoints,plotkde=plotkde,plotapgm> File "/data/wanglab/liufangpu/miniconda3/envs/wgd2/lib/python3.8/site-packages/wgd/viz.py", line 683, in multi_sp_plot if not (user_ylim[0]) is None: ax.set_ylim(user_ylim[0],user_ylim[1]) TypeError: 'NoneType' object is not subscriptable
Here is my run command
nohup wgd ksd wgd_globalmrbh/global_MRBH.tsv *cds -sp speciestree.nw --reweight -o wgd_globalmrbh_ks --spair "Corallodiscus_lanuginosus_014.cds;Titanotrichum_oldhamii_042.cds" --spair "Corallodiscus_lanuginosus_014.cds;Gesneria_cuneifolia_074.cds" --spair "Corallodiscus_lanuginosus_014.cds;Rhynchoglossum_obliquum_123.cds" --spair "Corallodiscus_lanuginosus_014.cds;Calceolaria_pinifolia.cds" --spair "Corallodiscus_lanuginosus_014.cds;Corallodiscus_lanuginosus_014.cds" --plotkde &
Here is my system tree
(Calceolaria_pinifolia.cds,((Titanotrichum_oldhamii_042.cds,Gesneria_cuneifolia_074.cds),(Corallodiscus_lanuginosus_014.cds,Rhynchoglossum_obliquum_123.cds)));
Hi, it's a small bug in wgd ksd
. Now I have fixed it. You can reinstall the latest version from here. Let me know if it is still error. Thanks!
Thanks, but I refreshed the page and found that the latest version is version 2.0.22 submitted last week, is this it?
yes
I installed the new version and got the following error
14:50:25 INFO Analysing family GF00003082 core.py:2873 14:50:38 INFO Saving to wgd_globalmrbh_ks/global_MRBH.tsv.ks.tsv cli.py:493 14:50:39 INFO Making plots cli.py:495 INFO Implementing node-averaged Ks analysis viz.py:511 14:50:46 INFO Recalculating the weights per species pair viz.py:567 INFO Plotting kde curve over histogram viz.py:569 INFO The corrected mode of species pair viz.py:636 Corallodiscus_lanuginosus_014.cds__Titanotrichum_ol dhamii_042.cds is 0.91 14:50:47 INFO The mode of species pair viz.py:641 Corallodiscus_lanuginosus_014.cds__Titanotrichum_ol dhamii_042.cds is 0.534 INFO The corrected mode of species pair viz.py:636 Corallodiscus_lanuginosus_014.cds__Gesneria_cuneifo lia_074.cds is 0.92 INFO The mode of species pair viz.py:641 Corallodiscus_lanuginosus_014.cds__Gesneria_cuneifo lia_074.cds is 0.587 INFO The corrected mode of species pair viz.py:636 Corallodiscus_lanuginosus_014.cds__Rhynchoglossum_o bliquum_123.cds is 0.81 INFO The mode of species pair viz.py:641 Corallodiscus_lanuginosus_014.cds__Rhynchoglossum_o bliquum_123.cds is 1.045 INFO The mode of species pair viz.py:641 Calceolaria_pinifolia.cds__Corallodiscus_lanuginosu s_014.cds is 1.073 Traceback (most recent call last): File "/data/wanglab/liufangpu/miniconda3/envs/wgd2/bin/wgd", line 10, in <module> sys.exit(cli()) File "/data/wanglab/liufangpu/miniconda3/envs/wgd2/lib/python3.8/site-packages/click/core.py", line 829, in __call__ return self.main(*args, **kwargs) File "/data/wanglab/liufangpu/miniconda3/envs/wgd2/lib/python3.8/site-packages/click/core.py", line 782, in main rv = self.invoke(ctx) File "/data/wanglab/liufangpu/miniconda3/envs/wgd2/lib/python3.8/site-packages/click/core.py", line 1259, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/data/wanglab/liufangpu/miniconda3/envs/wgd2/lib/python3.8/site-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, **ctx.params) File "/data/wanglab/liufangpu/miniconda3/envs/wgd2/lib/python3.8/site-packages/click/core.py", line 610, in invoke return callback(*args, **kwargs) File "/data/wanglab/liufangpu/miniconda3/envs/wgd2/lib/python3.8/site-packages/cli.py", line 464, in ksd _ksd(**kwargs) File "/data/wanglab/liufangpu/miniconda3/envs/wgd2/lib/python3.8/site-packages/cli.py", line 506, in _ksd multi_sp_plot(df,spair,spgenemap,outdir,onlyrootout,title=prefix,ylabel=ylabel,ksd=True,reweight=reweight,sptree=speciestree,extraparanomeks=extraparanomeks, ap = anchorpoints,plotkde=plotkde> File "/data/wanglab/liufangpu/miniconda3/envs/wgd2/lib/python3.8/site-packages/wgd/viz.py", line 683, in multi_sp_plot if not (user_ylim[0]) is None: ax.set_ylim(user_ylim[0],user_ylim[1]) TypeError: 'NoneType' object is not subscriptable
Hi, you need to reinstall the version from this github repository.
I put viz.py on lines 683 and 684 of this script
if
not (user_ylim[0]) is None: ax.set_ylim(user_ylim[0],user_ylim[1])
Change it to the following and it should now work properly
if user_ylim is not None: ax.set_ylim(user_ylim[0], user_ylim[1])
However, when I used different pairs of samples for calculation, he reported the following errors.
20:51:01 INFO Analysing family GF00003082 core.py:2873 20:51:13 INFO Saving to wgd_globalmrbh_ks/global_MRBH.tsv.ks.tsv cli.py:493 20:51:14 INFO Making plots cli.py:495 INFO Implementing node-averaged Ks analysis viz.py:511 Traceback (most recent call last): File "/data/wanglab/liufangpu/miniconda3/envs/wgd2/bin/wgd", line 10, in <module> sys.exit(cli()) File "/data/wanglab/liufangpu/miniconda3/envs/wgd2/lib/python3.8/site-packages/click/core.py", line 829, in __call__ return self.main(*args, **kwargs) File "/data/wanglab/liufangpu/miniconda3/envs/wgd2/lib/python3.8/site-packages/click/core.py", line 782, in main rv = self.invoke(ctx) File "/data/wanglab/liufangpu/miniconda3/envs/wgd2/lib/python3.8/site-packages/click/core.py", line 1259, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/data/wanglab/liufangpu/miniconda3/envs/wgd2/lib/python3.8/site-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, **ctx.params) File "/data/wanglab/liufangpu/miniconda3/envs/wgd2/lib/python3.8/site-packages/click/core.py", line 610, in invoke return callback(*args, **kwargs) File "/data/wanglab/liufangpu/miniconda3/envs/wgd2/lib/python3.8/site-packages/cli.py", line 464, in ksd _ksd(**kwargs) File "/data/wanglab/liufangpu/miniconda3/envs/wgd2/lib/python3.8/site-packages/cli.py", line 506, in _ksd multi_sp_plot(df,spair,spgenemap,outdir,onlyrootout,title=prefix,ylabel=ylabel,ksd=True,reweight=reweight,sptree=speciestree,extraparanomeks=extraparanomeks, ap = anchorpoints,plotkde=plotkde,plotapgmm=plotapgmm,plotelmm=plotelmm,components=components,na=True) File "/data/wanglab/liufangpu/miniconda3/envs/wgd2/lib/python3.8/site-packages/wgd/viz.py", line 527, in multi_sp_plot df_perspair,allspair,paralog_pair,corrected_ks_spair,Outgroup_spnames = getspair_ks(spair,df,reweight,onlyrootout,sptree=sptree,na=na,spgenemap=spgenemap) File "/data/wanglab/liufangpu/miniconda3/envs/wgd2/lib/python3.8/site-packages/wgd/viz.py", line 72, in getspair_ks if sptree != None and len(paralog_pair) !=0 : corrected_ks_spair,Outgroup_spnames = correctks(df,sptree,paralog_pair[0],reweight,onlyrootout,na=na) File "/data/wanglab/liufangpu/miniconda3/envs/wgd2/lib/python3.8/site-packages/wgd/viz.py", line 191, in correctks else: all_spairs,spairs,Trios,Trios_dict = gettrios_overall(focusp,Ingroup_spnames,Outgroup_spnames,Ingroup_clade) File "/data/wanglab/liufangpu/miniconda3/envs/wgd2/lib/python3.8/site-packages/wgd/viz.py", line 116, in gettrios_overall mrca = Ingroup_clade.common_ancestor({"name": sister}, {"name": focusp}) File "/data/wanglab/liufangpu/miniconda3/envs/wgd2/lib/python3.8/site-packages/Bio/Phylo/BaseTree.py", line 471, in common_ancestor for level in zip(*paths): TypeError: 'NoneType' object is not iterable
Here is the command I run
nohup wgd ksd wgd_globalmrbh/global_MRBH.tsv *cds -n 16 --extraparanomeks wgd_ksd/Calceolaria_pinifolia.cds.tsv.ks.tsv -sp speciestree.nw --reweight -o wgd_globalmrbh_ks --spair "Calceolaria_pinifolia.cds;Titanotrichum_oldhamii_042.cds" --spair "Calceolaria_pinifolia.cds;Gesneria_cuneifolia_074.cds" --spair "Calceolaria_pinifolia.cds;Rhynchoglossum_obliquum_123.cds" --spair "Calceolaria_pinifolia.cds;Corallodiscus_lanuginosus_014.cds" --spair "Calceolaria_pinifolia.cds;Calceolaria_pinifolia.cds" --plotkde &
Could you show me the content of your input speciestree.nw
?
Here is my system tree (Calceolaria_pinifolia.cds,((Titanotrichum_oldhamii_042.cds,Gesneria_cuneifolia_074.cds),(Corallodiscus_lanuginosus_014.cds,Rhynchoglossum_obliquum_123.cds)));
Hi, the tree is apparently problematic. The focus species Calceolaria_pinifolia.cds
has no available outgroup. Now your tree is like this (Calceolaria_pinifolia.cds,Other_species);
but it should be like (Outgroup,(Calceolaria_pinifolia.cds,Other_species));
.
Thank you very much for your help. Now I have such a problem, that is, in the two branches, each of them has a shared peak. If I want to confirm whether the peaks of the two branches are the same WGD or have one WGD respectively, what strategy should I adopt?
The tree and expected wgd are shown below
------------------ 原始邮件 ------------------ 发件人: "heche-psb/wgd" @.>; 发送时间: 2023年9月11日(星期一) 下午5:24 @.>; @.**@.>; 主题: Re: [heche-psb/wgd] ksd global erro (Issue #9)
Hi, the tree is apparently problematic. The focus species Calceolaria_pinifolia.cds has no available outgroup. Now your tree is like this (Calceolaria_pinifolia.cds,Other_species); but it should be like (Outgroup,(Calceolaria_pinifolia.cds,Other_species));.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
You can first compare the WGD age with the corrected divergence age in Ks.
Thank you
Do I calculate the WGD time of the two branches separately and see if their time is inconsistent?
------------------ 原始邮件 ------------------ 发件人: @.>; 发送时间: 2023年9月12日(星期二) 下午2:58 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [heche-psb/wgd] ksd global erro (Issue #9)
You can first compare the WGD age with the corrected divergence age in Ks.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
The WGD time estimate might be different reflecting variation of evolutionary rate. To compare evolutionary rate, using orthologous Ks distribution is enough. But if you want to make sure your WGD date is robust to different dating strategies and evolutionary rate, you can also do the comparative analysis.
ok
But in the example file, I didn't understand something, like the image below
Does this graph indicate that Aquilegia coerulea, Protea cynaroides and Vitis vinifera did not share WGD events?
If Corerected mode Aquilegia_coerulea__Protea_cynaroides is near 1.2
Or Corerected mode Aquilegia_coerulea__Vitis_vinifera near 1.2, Is this the WGD that indicates that Aquilegia_coerulea shares with Protea_cynaroides or Vitis_vinifera?
------------------ 原始邮件 ------------------ 发件人: "heche-psb/wgd" @.>; 发送时间: 2023年9月19日(星期二) 下午5:21 @.>; @.**@.>; 主题: Re: [heche-psb/wgd] ksd global erro (Issue #9)
The WGD time estimate might be different reflecting variation of evolutionary rate. To compare evolutionary rate, using orthologous Ks distribution is enough. But if you want to make sure your WGD date is robust to different dating strategies and evolutionary rate, you can also do the comparative analysis.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
The principle is that if the WGD age of A is clearly older than the corerected mode with B, it means the WGD of A is shared with B. Reversely, if the WGD age of A is clearly younger than the corerected mode with B, it means the WGD of A is unique to A and not shared with B.
If the Original mode of A and B is greater than the ks peak of A, but corrected mode is less than the peak of A, is a corrected mode used?
------------------ 原始邮件 ------------------ 发件人: @.>; 发送时间: 2023年9月19日(星期二) 晚上9:27 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [heche-psb/wgd] ksd global erro (Issue #9)
The principle is that if the WGD age of A is clearly older than the corerected mode with B, it means the WGD of A is shared with B. Reversely, if the WGD age of A is clearly younger than the corerected mode with B, it means the WGD of A is unique to A and not shared with B.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
We should use the corrected mode to interpret the relative order of WGD event and divergence event. The original mode is biased by rate variation and thus unreliable.
ok
If I get a result Corrected mode is very close to the peak ks of a single species, as shown in the figure below, what is the general situation?
The WGD age is around 0.8, older than the divergence with Gesneria while younger than the divergence with Calceolaria, suggesting a shared polyploidization event with Gesneria but not shared with Calceolaria. Before and after rate correction, the conclusion remains. I think the result is clear. Do you have the result from anchor Ks? It can facilitate the verification.
I only have transcriptome data but no genome data. Can I do anchor Ks?
Apparently not if the gene order information is not available. But the mixed Ks distribution you generated has supported the conclusion already.
ok
One such problem is that the Gesneria branch all has a peak around ks 0.65 (span 0.55-0.75), while the Whytockia branch all has a peak around ks 1.0 (span 0.8-1.2), Gesneria and Whytockia are sister groups. But since Gesneria and Whytockia shared WGD once in our analysis, why is there such a significant difference in the peaks of ks shared between the two branches?
I don't think it is a problem. It reflects the rate variation across lineages. For instance, the gamma event is shared by core eudicots. But the Ks peak of this polyploidization event differs across many lineages. If the data is not genome but transcriptome, the result will be affected by the missing and incomplete genes of course.
ok
Thank you very much for your help
Now I have a new problem If I use Gesneria as a reference, the result seems to be different. His result shows that Gesneria, Whytockia and Calceolaria are all WGD at the same time. So what's going on here?
If the Ks method could not solve your problem neatly, you should resort to phylogenomic method. The observation that different focus species manifest different results meaning the Ks age difference between WGD and divergence events is small.
ok
We've also had some problems with the phylogenomic method, so we'd like to see if the ks method can solve them.
There is also a question I would like to ask, if there is a hybridization event, will it lead to a larger ks peak value?
Thank you
Yes, the hybridization event will render the peak Ks value higher.
I have a question to ask, is there any relevant literature support for the "hybridization event will render the peak Ks value higher" you mentioned? We have not read any relevant research. I wonder if it would be convenient for you to let me know
Thank you very much for your help
------------------ 原始邮件 ------------------ 发件人: @.>; 发送时间: 2023年9月21日(星期四) 晚上9:56 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [heche-psb/wgd] ksd global erro (Issue #9)
Yes, the hybridization event will render the peak Ks value higher.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
You may refer to the Figure 1 of this paper: https://doi.org/10.1093/sysbio/syx044
Thank you very much. I'll go and learn about it
------------------ 原始邮件 ------------------ 发件人: @.>; 发送时间: 2023年10月27日(星期五) 下午4:53 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [heche-psb/wgd] ksd global erro (Issue #9)
You may refer to the Figure 1 of this paper: https://doi.org/10.1093/sysbio/syx044
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
May I ask a question about the results of wgd2 ks peak plot, which one should we choose for the result files : node average and weighted? Sometimes there may be significant differences between the two results, and sometimes the node average matches the expected result, and sometimes the weighted result matches the expected result
------------------ 原始邮件 ------------------ 发件人: "heche-psb/wgd" @.>; 发送时间: 2023年10月27日(星期五) 下午4:53 @.>; @.**@.>; 主题: Re: [heche-psb/wgd] ksd global erro (Issue #9)
You may refer to the Figure 1 of this paper: https://doi.org/10.1093/sysbio/syx044
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
I suggest that you could first refer to our book chapter to know more in details about the differences. In short, both node-averaged and node-weighted methods of de-redundancy are reasonable. The node-averaged method will reduce the number of total Ks values while the node-weighted method will remain the raw number of Ks values but associate them with a weight in terms of their phylogenetic depth. Both are to achieve one single (or weights sum up to 1) Ks value per one single gene duplication event. Users can decide which one to adopt. In my opinion, both are valid, although it could be that sometimes certain method performs better in peak unearthing as you said.
Sure, I greatly appreciate your assistance.
------------------ 原始邮件 ------------------ 发件人: @.>; 发送时间: 2023年10月30日(星期一) 下午5:46 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [heche-psb/wgd] ksd global erro (Issue #9)
I suggest that you could first refer to our book chapter to know more in details about the differences. In short, both node-averaged and node-weighted methods of de-redundancy are reasonable. The node-averaged method will reduce the number of total Ks values while the node-weighted method will remain the raw number of Ks values but associate them with a weight in terms of their phylogenetic depth. Both are to achieve one single (or weights sum up to 1) Ks value per one single gene duplication event. Users can decide which one to adopt. In my opinion, both are valid, although it could be that sometimes certain method performs better in peak unearthing as you said.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
When I used ksd for multi-species comparisons, the following error occurred Here are my commands and errors
wgd ksd wgd_globalmrbh/global_MRBH.tsv --extraparanomeks wgd_ksd/Corallodiscus_lanuginosus_014.cds.tsv.ks.tsv -sp speciestree.nw --reweight -o wgd_globalmrbh_ks --spair "./cds/Corallodiscus_lanuginosus_014.cds;./cds/Titanotrichum_oldhamii_042.cds" --spair "./cds/Corallodiscus_lanuginosus_014.cds;./cds/Gesneria_cuneifolia_074.cds" --spair "./cds/Corallodiscus_lanuginosus_014.cds;./cds/Rhynchoglossum_obliquum_123.cds" --spair "./cds/Corallodiscus_lanuginosus_014.cds;./cds/Calceolaria_pinifolia.cds" --spair "./cds/Corallodiscus_lanuginosus_014.cds;./cds/Corallodiscus_lanuginosus_014.cds" --plotkde 16:28:54 INFO This is wgd v2.0.22 cli.py:32 16:28:55 ERROR Please provide at least one sequence file
May I ask how to solve it?