Closed manoharbisht1998 closed 10 months ago
Hi, thanks for your interest in wgd v2! Could I know the full command you used for the program wgd ksd
? Besides, if you have already the Ks data, I recommend using wgd viz
to achieve the rate correction. First, you can use the orthogroup file (global MRBH) and the cds sequences of all the species to calculate the orthologous Ks values. With the orthologous Ks distribution and your previous paralogous Ks distribution, you need to provide the species tree and the species pair you want to plot to wgd viz
to achieve the rate correction.
Thanks for the reply, so e to conduct the rate correction I used the following command : nohup wgd ksd wgd_globalmrbh/global_MRBH.tsv --extraparanomeks wgd_ksd/Species_interest.tsv.ks.tsv -sp speciestree.nw --reweight -o wgd_globalmrbh_ks --spair "Species_interest;Oryza_sativa" --spair "Species_interest;Sorghum_bicolor" --spair "Species_interest;Musa_acuminata" --spair "Species_interest;Species_interest" --plotkde --nthreads 90 Species_interest Oryza_sativa Sorghum_bicolor Musa_acuminata &
About wgd viz commands are these the command which you are talking about wgd viz -d wgd_globalmrbh_ks/global_MRBH.tsv.ks.tsv --extraparanomeks wgd_ksd/Aquilegia_coerulea.tsv.ks.tsv -sp speciestree.nw --reweight -ap wgd_syn/iadhore-out/anchorpoints.txt -o wgd_viz_mixed_Ks --spair "Aquilegia_coerulea;Protea_cynaroides" --spair "Aquilegia_coerulea;Vitis_vinifera" --spair "Aquilegia_coerulea;Acorus_americanus" --spair "Aquilegia_coerulea;Aquilegia_coerulea" --gsmap gene_species.map --plotkde
wgd viz -d wgd_globalmrbh_ks/global_MRBH.tsv.ks.tsv -sp speciestree.nw --reweight -o wgd_viz_Compare_rate --spair "Acorus_americanus;Protea_cynaroides" --spair "Aquilegia_coerulea;Acorus_americanus" --spair "Vitis_vinifera;Acorus_americanus" --gsmap gene_species.map --plotkde Thanks for the suggestion I will try that and I will get back to you
Hi, there seems to be a small bug in this function. I just fixed it and you can install the latest commit here and try. Yes, the command for wgd viz
and wgd ksd
are correct.
what is gsmap parameter in wgd viz?
That's the gene-species map file, which should be automately produced from wgd ksd
. Or you can also prepare it manually as shown in README.
so yeah, I rerun this command nohup wgd ksd wgd_globalmrbh/global_MRBH.tsv --extraparanomeks wgd_ksd/Species_interest.tsv.ks.tsv -sp speciestree.nw --reweight -o wgd_globalmrbh_ks --spair "Species_interest;Oryza_sativa" --spair "Species_interest;Sorghum_bicolor" --spair "Species_interest;Musa_acuminata" --spair "Species_interest;Species_interest" --plotkde --nthreads 90 Species_interest Oryza_sativa Sorghum_bicolor Musa_acuminata &
and still i am getting the error as above
kindly help
Normally, the error message, "ValueError: dataset input should have multiple elements," is raised when the input dataset passed to the KDE function contains only one element. The KDE function requires a dataset with more than one data point to estimate the density, which means that the Ks values were either not properly calculated or only one value. Could you please share me with the output directory (including the output) from the failed wgd ksd
?
Hi so I run wgd viz wgd viz -d wgd_globalmrbh_ks/global_MRBH.tsv.ks.tsv --extraparanomeks wgd_ksd/Aquilegia_coerulea.tsv.ks.tsv -sp speciestree.nw --reweight -ap wgd_syn/iadhore-out/anchorpoints.txt -o wgd_viz_mixed_Ks --spair "Aquilegia_coerulea;Protea_cynaroides" --spair "Aquilegia_coerulea;Vitis_vinifera" --spair "Aquilegia_coerulea;Acorus_americanus" --spair "Aquilegia_coerulea;Aquilegia_coerulea" --gsmap gene_species.map --plotkde
but getting the following error
if "weightoutlierexcluded" not in ksdf.columns: weight_inc = get_outlierincluded(ksdf) File "/home/genomics7/SOFTWARES/wgd/ENV/lib/python3.7/site-packages/wgd/utils.py", line 48, in get_outlierincluded weight_inc = 1/df.groupby(['family', 'node'])['dS'].transform('count') File "/home/genomics7/SOFTWARES/wgd/ENV/lib/python3.7/site-packages/pandas/core/frame.py", line 7641, in groupby dropna=dropna, File "/home/genomics7/SOFTWARES/wgd/ENV/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 897, in init dropna=self.dropna, File "/home/genomics7/SOFTWARES/wgd/ENV/lib/python3.7/site-packages/pandas/core/groupby/grouper.py", line 862, in get_grouper raise KeyError(gpr) KeyError: 'family'
Yes, it was a bug that I think now is fixed. If you removed the --reweight
option, I think it will work fine. Or you can install the latest commit here and try again which should be working now. Feel free to comment any bugs. Thanks!
I installed the latest commit and the same error is coming, further, I also removed the --reweight option still the error is popping up! kindly help
May I ask the datafile global_MRBH.tsv.ks.tsv
and extra paranome Ks datafile you used for this command? It seems the error comes from the format issue of Ks datafile.
sure here it is files.zip
I can't reproduce your error. But I think I might know which part went wrong. Could you provide me with your full command and remaining data? So I might be able to reproduce your error. Thanks a lot!
Thanks for checking But the thing is data is bit confidential, I hope you understand, but in case you have any thoughts what is going wrong... Please let me know, I will try at my end.
Thanks again
i am using this command
wgd viz -d wgd_globalmrbh_ks/global_MRBH.tsv.ks.tsv --extraparanomeks wgd_ksd/Aquilegia_coerulea.tsv.ks.tsv -sp speciestree.nw --reweight -ap wgd_syn/iadhore-out/anchorpoints.txt -o wgd_viz_mixed_Ks --spair "Aquilegia_coerulea;Protea_cynaroides" --spair "Aquilegia_coerulea;Vitis_vinifera" --spair "Aquilegia_coerulea;Acorus_americanus" --spair "Aquilegia_coerulea;Aquilegia_coerulea" --gsmap gene_species.map --plotkde
Thanks again
I understand your worry. But you can just provide me with some re-labeled data, for instance, you can change your species name with XXX, and the sequence name also with XXX_. That way your data is still safe and I can try to help you debug. Thanks for your understanding.
Were you able to fix this issue?
wgd viz -d wgd_globalmrbh_ks/global_MRBH.tsv.ks.tsv --extraparanomeks wgd_ksd/Aquilegia_coerulea.tsv.ks.tsv -sp speciestree.nw --reweight -ap wgd_syn/iadhore-out/anchorpoints.txt -o wgd_viz_mixed_Ks --spair "Aquilegia_coerulea;Protea_cynaroides" --spair "Aquilegia_coerulea;Vitis_vinifera" --spair "Aquilegia_coerulea;Acorus_americanus" --spair "Aquilegia_coerulea;Aquilegia_coerulea" --gsmap gene_species.map --plotkde
I am getting these errors:
Traceback (most recent call last):
File "/home/.conda/envs/wgd2/bin/wgd", line 10, in
Thanks,
Hi, I have successfully run the code to find the result but i have some doubts regarding the results (https://github.com/heche-psb/wgd/issues/14#issue-1919851607). Kindly help!
Thanks
First of all Thanks a lot for the tool and the support provided with the previous issue. I want to do the rate correction using wgd ksd wgd_globalmrbh command but I am encountering the follwoing error; sys.exit(cli()) File "/home/genomics7/SOFTWARES/wgd/ENV/lib/python3.7/site-packages/click/core.py", line 829, in call return self.main(args, kwargs) File "/home/genomics7/SOFTWARES/wgd/ENV/lib/python3.7/site-packages/click/core.py", line 782, in main rv = self.invoke(ctx) File "/home/genomics7/SOFTWARES/wgd/ENV/lib/python3.7/site-packages/click/core.py", line 1259, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/genomics7/SOFTWARES/wgd/ENV/lib/python3.7/site-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/genomics7/SOFTWARES/wgd/ENV/lib/python3.7/site-packages/click/core.py", line 610, in invoke return callback(args, kwargs) File "/home/genomics7/SOFTWARES/wgd/ENV/lib/python3.7/site-packages/cli.py", line 451, in ksd _ksd(kwargs) File "/home/genomics7/SOFTWARES/wgd/ENV/lib/python3.7/site-packages/cli.py", line 490, in _ksd multi_sp_plot(df,spair,spgenemap,outdir,onlyrootout,title=prefix,ylabel=ylabel,ksd=True,reweight=reweight,sptree=speciestree,extraparanomeks=extraparanomeks, ap = anchorpoints,plotkde=plotkde,plotapgmm=plotapgmm,plotelmm=plotelmm,components=components,na=True) File "/home/genomics7/SOFTWARES/wgd/ENV/lib/python3.7/site-packages/wgd/viz.py", line 499, in multi_sp_plot df_perspair,allspair,paralog_pair,corrected_ks_spair,Outgroup_spnames = getspair_ks(spair,df,spgenemap,reweight,onlyrootout,sptree=sptree) File "/home/genomics7/SOFTWARES/wgd/ENV/lib/python3.7/site-packages/wgd/viz.py", line 63, in getspair_ks if sptree != None and len(paralog_pair) !=0 : corrected_ks_spair,Outgroup_spnames = correctks(df,sptree,paralog_pair[0],reweight,onlyrootout) File "/home/genomics7/SOFTWARES/wgd/ENV/lib/python3.7/site-packages/wgd/viz.py", line 170, in correctks ks_spair = getspairks(all_spairs,df,reweight,method='mode') File "/home/genomics7/SOFTWARES/wgd/ENV/lib/python3.7/site-packages/wgd/viz.py", line 129, in getspairks kde = stats.gaussian_kde(y,weights=w,bw_method=0.1) File "/home/genomics7/SOFTWARES/wgd/ENV/lib/python3.7/site-packages/scipy/stats/kde.py", line 193, in init raise ValueError("
dataset
input should have multiple elements.") ValueError:dataset
input should have multiple elements.I have no idea what went wrong! Please help.
Thank you