heche-psb / wgd

wgd v2: a suite of tools to uncover and date ancient polyploidy and whole-genome duplication
https://wgdv2.readthedocs.io/en/latest/
GNU General Public License v3.0
21 stars 0 forks source link

ERROR: Rate correction using wgd ksd wgd_globalmrbh #4

Closed manoharbisht1998 closed 10 months ago

manoharbisht1998 commented 11 months ago

First of all Thanks a lot for the tool and the support provided with the previous issue. I want to do the rate correction using wgd ksd wgd_globalmrbh command but I am encountering the follwoing error; sys.exit(cli()) File "/home/genomics7/SOFTWARES/wgd/ENV/lib/python3.7/site-packages/click/core.py", line 829, in call return self.main(args, kwargs) File "/home/genomics7/SOFTWARES/wgd/ENV/lib/python3.7/site-packages/click/core.py", line 782, in main rv = self.invoke(ctx) File "/home/genomics7/SOFTWARES/wgd/ENV/lib/python3.7/site-packages/click/core.py", line 1259, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/genomics7/SOFTWARES/wgd/ENV/lib/python3.7/site-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/genomics7/SOFTWARES/wgd/ENV/lib/python3.7/site-packages/click/core.py", line 610, in invoke return callback(args, kwargs) File "/home/genomics7/SOFTWARES/wgd/ENV/lib/python3.7/site-packages/cli.py", line 451, in ksd _ksd(kwargs) File "/home/genomics7/SOFTWARES/wgd/ENV/lib/python3.7/site-packages/cli.py", line 490, in _ksd multi_sp_plot(df,spair,spgenemap,outdir,onlyrootout,title=prefix,ylabel=ylabel,ksd=True,reweight=reweight,sptree=speciestree,extraparanomeks=extraparanomeks, ap = anchorpoints,plotkde=plotkde,plotapgmm=plotapgmm,plotelmm=plotelmm,components=components,na=True) File "/home/genomics7/SOFTWARES/wgd/ENV/lib/python3.7/site-packages/wgd/viz.py", line 499, in multi_sp_plot df_perspair,allspair,paralog_pair,corrected_ks_spair,Outgroup_spnames = getspair_ks(spair,df,spgenemap,reweight,onlyrootout,sptree=sptree) File "/home/genomics7/SOFTWARES/wgd/ENV/lib/python3.7/site-packages/wgd/viz.py", line 63, in getspair_ks if sptree != None and len(paralog_pair) !=0 : corrected_ks_spair,Outgroup_spnames = correctks(df,sptree,paralog_pair[0],reweight,onlyrootout) File "/home/genomics7/SOFTWARES/wgd/ENV/lib/python3.7/site-packages/wgd/viz.py", line 170, in correctks ks_spair = getspairks(all_spairs,df,reweight,method='mode') File "/home/genomics7/SOFTWARES/wgd/ENV/lib/python3.7/site-packages/wgd/viz.py", line 129, in getspairks kde = stats.gaussian_kde(y,weights=w,bw_method=0.1) File "/home/genomics7/SOFTWARES/wgd/ENV/lib/python3.7/site-packages/scipy/stats/kde.py", line 193, in init raise ValueError("dataset input should have multiple elements.") ValueError: dataset input should have multiple elements.

I have no idea what went wrong! Please help.

Thank you

heche-psb commented 11 months ago

Hi, thanks for your interest in wgd v2! Could I know the full command you used for the program wgd ksd? Besides, if you have already the Ks data, I recommend using wgd viz to achieve the rate correction. First, you can use the orthogroup file (global MRBH) and the cds sequences of all the species to calculate the orthologous Ks values. With the orthologous Ks distribution and your previous paralogous Ks distribution, you need to provide the species tree and the species pair you want to plot to wgd viz to achieve the rate correction.

manoharbisht1998 commented 11 months ago

Thanks for the reply, so e to conduct the rate correction I used the following command : nohup wgd ksd wgd_globalmrbh/global_MRBH.tsv --extraparanomeks wgd_ksd/Species_interest.tsv.ks.tsv -sp speciestree.nw --reweight -o wgd_globalmrbh_ks --spair "Species_interest;Oryza_sativa" --spair "Species_interest;Sorghum_bicolor" --spair "Species_interest;Musa_acuminata" --spair "Species_interest;Species_interest" --plotkde --nthreads 90 Species_interest Oryza_sativa Sorghum_bicolor Musa_acuminata &

About wgd viz commands are these the command which you are talking about wgd viz -d wgd_globalmrbh_ks/global_MRBH.tsv.ks.tsv --extraparanomeks wgd_ksd/Aquilegia_coerulea.tsv.ks.tsv -sp speciestree.nw --reweight -ap wgd_syn/iadhore-out/anchorpoints.txt -o wgd_viz_mixed_Ks --spair "Aquilegia_coerulea;Protea_cynaroides" --spair "Aquilegia_coerulea;Vitis_vinifera" --spair "Aquilegia_coerulea;Acorus_americanus" --spair "Aquilegia_coerulea;Aquilegia_coerulea" --gsmap gene_species.map --plotkde

wgd viz -d wgd_globalmrbh_ks/global_MRBH.tsv.ks.tsv -sp speciestree.nw --reweight -o wgd_viz_Compare_rate --spair "Acorus_americanus;Protea_cynaroides" --spair "Aquilegia_coerulea;Acorus_americanus" --spair "Vitis_vinifera;Acorus_americanus" --gsmap gene_species.map --plotkde Thanks for the suggestion I will try that and I will get back to you

heche-psb commented 11 months ago

Hi, there seems to be a small bug in this function. I just fixed it and you can install the latest commit here and try. Yes, the command for wgd viz and wgd ksd are correct.

manoharbisht1998 commented 11 months ago

what is gsmap parameter in wgd viz?

heche-psb commented 11 months ago

That's the gene-species map file, which should be automately produced from wgd ksd. Or you can also prepare it manually as shown in README.

manoharbisht1998 commented 11 months ago

so yeah, I rerun this command nohup wgd ksd wgd_globalmrbh/global_MRBH.tsv --extraparanomeks wgd_ksd/Species_interest.tsv.ks.tsv -sp speciestree.nw --reweight -o wgd_globalmrbh_ks --spair "Species_interest;Oryza_sativa" --spair "Species_interest;Sorghum_bicolor" --spair "Species_interest;Musa_acuminata" --spair "Species_interest;Species_interest" --plotkde --nthreads 90 Species_interest Oryza_sativa Sorghum_bicolor Musa_acuminata &

and still i am getting the error as above

kindly help

heche-psb commented 11 months ago

Normally, the error message, "ValueError: dataset input should have multiple elements," is raised when the input dataset passed to the KDE function contains only one element. The KDE function requires a dataset with more than one data point to estimate the density, which means that the Ks values were either not properly calculated or only one value. Could you please share me with the output directory (including the output) from the failed wgd ksd?

manoharbisht1998 commented 11 months ago

Hi so I run wgd viz wgd viz -d wgd_globalmrbh_ks/global_MRBH.tsv.ks.tsv --extraparanomeks wgd_ksd/Aquilegia_coerulea.tsv.ks.tsv -sp speciestree.nw --reweight -ap wgd_syn/iadhore-out/anchorpoints.txt -o wgd_viz_mixed_Ks --spair "Aquilegia_coerulea;Protea_cynaroides" --spair "Aquilegia_coerulea;Vitis_vinifera" --spair "Aquilegia_coerulea;Acorus_americanus" --spair "Aquilegia_coerulea;Aquilegia_coerulea" --gsmap gene_species.map --plotkde

but getting the following error

if "weightoutlierexcluded" not in ksdf.columns: weight_inc = get_outlierincluded(ksdf) File "/home/genomics7/SOFTWARES/wgd/ENV/lib/python3.7/site-packages/wgd/utils.py", line 48, in get_outlierincluded weight_inc = 1/df.groupby(['family', 'node'])['dS'].transform('count') File "/home/genomics7/SOFTWARES/wgd/ENV/lib/python3.7/site-packages/pandas/core/frame.py", line 7641, in groupby dropna=dropna, File "/home/genomics7/SOFTWARES/wgd/ENV/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 897, in init dropna=self.dropna, File "/home/genomics7/SOFTWARES/wgd/ENV/lib/python3.7/site-packages/pandas/core/groupby/grouper.py", line 862, in get_grouper raise KeyError(gpr) KeyError: 'family'

heche-psb commented 11 months ago

Yes, it was a bug that I think now is fixed. If you removed the --reweight option, I think it will work fine. Or you can install the latest commit here and try again which should be working now. Feel free to comment any bugs. Thanks!

manoharbisht1998 commented 11 months ago

I installed the latest commit and the same error is coming, further, I also removed the --reweight option still the error is popping up! kindly help

heche-psb commented 11 months ago

May I ask the datafile global_MRBH.tsv.ks.tsv and extra paranome Ks datafile you used for this command? It seems the error comes from the format issue of Ks datafile.

manoharbisht1998 commented 11 months ago

sure here it is files.zip

heche-psb commented 11 months ago

I can't reproduce your error. But I think I might know which part went wrong. Could you provide me with your full command and remaining data? So I might be able to reproduce your error. Thanks a lot!

manoharbisht1998 commented 11 months ago

Thanks for checking But the thing is data is bit confidential, I hope you understand, but in case you have any thoughts what is going wrong... Please let me know, I will try at my end.

Thanks again

i am using this command

wgd viz -d wgd_globalmrbh_ks/global_MRBH.tsv.ks.tsv --extraparanomeks wgd_ksd/Aquilegia_coerulea.tsv.ks.tsv -sp speciestree.nw --reweight -ap wgd_syn/iadhore-out/anchorpoints.txt -o wgd_viz_mixed_Ks --spair "Aquilegia_coerulea;Protea_cynaroides" --spair "Aquilegia_coerulea;Vitis_vinifera" --spair "Aquilegia_coerulea;Acorus_americanus" --spair "Aquilegia_coerulea;Aquilegia_coerulea" --gsmap gene_species.map --plotkde

Thanks again

heche-psb commented 11 months ago

I understand your worry. But you can just provide me with some re-labeled data, for instance, you can change your species name with XXX, and the sequence name also with XXX_. That way your data is still safe and I can try to help you debug. Thanks for your understanding.

Bio1nform commented 11 months ago

Hi manoharbisht1998

Were you able to fix this issue?

wgd viz -d wgd_globalmrbh_ks/global_MRBH.tsv.ks.tsv --extraparanomeks wgd_ksd/Aquilegia_coerulea.tsv.ks.tsv -sp speciestree.nw --reweight -ap wgd_syn/iadhore-out/anchorpoints.txt -o wgd_viz_mixed_Ks --spair "Aquilegia_coerulea;Protea_cynaroides" --spair "Aquilegia_coerulea;Vitis_vinifera" --spair "Aquilegia_coerulea;Acorus_americanus" --spair "Aquilegia_coerulea;Aquilegia_coerulea" --gsmap gene_species.map --plotkde

I am getting these errors: Traceback (most recent call last): File "/home/.conda/envs/wgd2/bin/wgd", line 10, in sys.exit(cli()) File "/home/.conda/envs/wgd2/lib/python3.8/site-packages/click/core.py", line 829, in call return self.main(args, kwargs) File "/home/.conda/envs/wgd2/lib/python3.8/site-packages/click/core.py", line 782, in main rv = self.invoke(ctx) File "/home/.conda/envs/wgd2/lib/python3.8/site-packages/click/core.py", line 1259, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/.conda/envs/wgd2/lib/python3.8/site-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/.conda/envs/wgd2/lib/python3.8/site-packages/click/core.py", line 610, in invoke return callback(args, kwargs) File "/home/.conda/envs/wgd2/lib/python3.8/site-packages/cli.py", line 449, in ksd _ksd(kwargs) File "/home/.conda/envs/wgd2/lib/python3.8/site-packages/cli.py", line 487, in _ksd if len(spair)!= 0: multi_sp_plot(df,spair,spgenemap,outdir,onlyrootout,title=prefix,ylabel=ylabel,ksd=True,reweight=reweight,sptree=speciestree,extraparanomeks=extraparanomeks, ap = anchorpoints,plotkde=plotkde,plotapgmm=plotapgmm,plotelmm=plotelmm,components=components) File "/home/.conda/envs/wgd2/lib/python3.8/site-packages/wgd/viz.py", line 537, in multi_sp_plot ax.quiver(mode,maxim*scale, corrected_ks_spair[pair]-mode, 0, angles='xy', scale_units='xy', scale=1,color=cs[i],width=0.005,headwidth=2,headlength=2,headaxislength=2) NameError: name 'scale' is not defined

Thanks,

manoharbisht1998 commented 9 months ago

Hi, I have successfully run the code to find the result but i have some doubts regarding the results (https://github.com/heche-psb/wgd/issues/14#issue-1919851607). Kindly help!

Thanks