heche-psb / wgd

wgd v2: a suite of tools to uncover and date ancient polyploidy and whole-genome duplication
https://wgdv2.readthedocs.io/en/latest/
GNU General Public License v3.0
21 stars 0 forks source link

Error running wgd #6

Closed Bio1nform closed 9 months ago

Bio1nform commented 11 months ago

Hi, This is great tool, i have used version 1. Now working with version2. I managed to install with conda, however i am getting following error

wgd -h Usage: wgd [OPTIONS] COMMAND [ARGS]... wgd v2 - Copyright (C) 2023-2024 Hengchi Chen Contact: heche@psb.vib-ugent.be Options: -v, --verbosity [info|debug] Verbosity level, default = info. -h, --help Show this message and exit. Commands: dmd All-vs-all diamond blastp + MCL clustering. focus Multiply species RBH or c-score defined orthologous family's gene... ksd Paranome and one-to-one ortholog Ks distribution inference... mix Mixture modeling of Ks distributions. peak Infer peak and CI of Ks distribution. syn Co-linearity and anchor inference using I-ADHoRe. viz Visualization of Ks distribution or synteny

wgd dmd 09:04:59 INFO This is wgd v1.2 cli.py:32 Traceback (most recent call last): File "/home/.conda/envs/WGD/bin/wgd", line 10, in sys.exit(cli()) File "/home/.local/lib/python3.6/site-packages/click/core.py", line 829, in call return self.main(args, kwargs) File "/home/.local/lib/python3.6/site-packages/click/core.py", line 782, in main rv = self.invoke(ctx) File "/home/.local/lib/python3.6/site-packages/click/core.py", line 1259, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/.local/lib/python3.6/site-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/.local/lib/python3.6/site-packages/click/core.py", line 610, in invoke return callback(args, kwargs) File "/home/.conda/envs/WGD/lib/python3.6/site-packages/cli.py", line 113, in dmd _dmd(kwargs) File "/home/.conda/envs/WGD/lib/python3.6/site-packages/cli.py", line 116, in _dmd from wgd.core import SequenceData, read_MultiRBH_gene_families,mrbh,ortho_infer,genes2fams,endt,segmentsaps,bsog ModuleNotFoundError: No module named 'wgd.core'

wgd viz 09:05:19 INFO This is wgd v1.2 cli.py:32 Traceback (most recent call last): File "/home/.conda/envs/WGD/bin/wgd", line 10, in sys.exit(cli()) File "/home/.local/lib/python3.6/site-packages/click/core.py", line 829, in call return self.main(args, kwargs) File "/home/.local/lib/python3.6/site-packages/click/core.py", line 782, in main rv = self.invoke(ctx) File "/home/.local/lib/python3.6/site-packages/click/core.py", line 1259, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/.local/lib/python3.6/site-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/.local/lib/python3.6/site-packages/click/core.py", line 610, in invoke return callback(args, kwargs) File "/home/.conda/envs/WGD/lib/python3.6/site-packages/cli.py", line 533, in viz _viz(kwargs) File "/home/.conda/envs/WGD/lib/python3.6/site-packages/cli.py", line 536, in _viz from wgd.viz import elmm_plot, apply_filters, multi_sp_plot, default_plot,all_dotplots,filter_by_minlength,dotplotunitgene,dotplotingene,filter_mingenumber ImportError: cannot import name 'elmm_plot'

Any help would be great. Thanks

heche-psb commented 11 months ago

Hi, thanks for the interest in wgd v2! Could you please try with python=3.6/3.7/3.8 and see if you would meet the same error? Besides, using pip install wgd is also an option.

Bio1nform commented 11 months ago

hi heche-psb,

Thanks, I managed to install it in 3.8, and is working now. I am confused with some steps, in the example: 1) wgd dmd Aquilegia_coerulea (Is Aquilegia_coerulea folder or fasta? Including multiple fasta files did work for me) 2) I am not able to get the families for the downstream analysis, which output from wgd dmd is families?

Thank you.

heche-psb commented 11 months ago

Hi, The Aquilegia_coerulea is the file name of cds sequence. If you provide only 1 cds file, it will calculate the whole paranome. With more than 1 cds files, it will calculate global, local MRBH or just pairwise RBH according to the other options you set. The family output file of wgd dmd Aquilegia_coerulea is Aquilegia_coerulea.tsv in the default output folder wgd_dmd, indicating the paralogous family. The workflow can be glanced at here.

Bio1nform commented 11 months ago

Hi, Thanks for the information. So i can use the MRBH as a family for the next steps. I ran wgd dmd Aquilegia_coerulea, and got Aquilegia_coerulea.tsv.

I a getting error with : wgd ksd wgd_dmd/Aquilegia_coerulea.tsv Aquilegia_coerulea

myjob.5830965_1.error.txt

heche-psb commented 11 months ago

I think it might be due to the version of PAML. Are you using PAML v4.9j? This version shall work fine.

Bio1nform commented 11 months ago

I am using PAML v4.9j still same error.

heche-psb commented 11 months ago

Could you please provide me with your input file? I will check if I may meet the same error.

Bio1nform commented 11 months ago

Here is the input fasta file. Aquilegia_coerulea.zip

heche-psb commented 11 months ago

I tried on your input file with command "wgd ksd wgd_dmd/Aquilegia_coerulea.txt.tsv Aquilegia_coerulea.txt" and it works. Maybe you could try it again? Thanks a lot.

Bio1nform commented 11 months ago

Hi am getting error running this too.

wgd ksd wgd_globalmrbh/global_MRBH.tsv --extraparanomeks wgd_ksd/Aquilegia_coerulea.tsv.ks.tsv -sp speciestree.nw --reweight -o wgd_globalmrbh_ks --spair "Aquilegia_coerulea;Protea_cynaroides" --spair "Aquilegia_coerulea;Vitis_vinifera" --spair "Aquilegia_coerulea;Acorus_americanus" --spair "Aquilegia_coerulea;Aquilegia_coerulea" --plotkde --nthreads 90 Aquilegia_coerulea Protea_cynaroides Acorus_americanus Vitis_vinifera

Help Please, Thanks

myjob.6072966_1.error.txt

Bio1nform commented 11 months ago

Hi, I am stuck in here, any help would be really appreciated.

Kindly help Please. Thank you

heche-psb commented 10 months ago

Hi, please install the latest version here in the github repository. The error information shows that you're using an older version that might have bugs which I have already fixed later.

Bio1nform commented 10 months ago

Hi Thanks for the information.

I installed wgd==2.0.20 version in python3.6.

wgd dmd --globalmrbh Aquilegia_coerulea Protea_cynaroides Acorus_americanus Vitis_vinifera -o wgd_globalmrbh

Now i get these errors: File "/home/.conda/envs/wgdV2/bin/wgd", line 10, in sys.exit(cli()) File "/home/.local/lib/python3.6/site-packages/click/core.py", line 829, in call return self.main(args, kwargs) File "/home/.local/lib/python3.6/site-packages/click/core.py", line 782, in main rv = self.invoke(ctx) File "/home/.local/lib/python3.6/site-packages/click/core.py", line 1259, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/.local/lib/python3.6/site-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/.local/lib/python3.6/site-packages/click/core.py", line 610, in invoke return callback(args, kwargs) File "/home/.conda/envs/wgdV2/lib/python3.6/site-packages/cli.py", line 453, in ksd _ksd(kwargs) File "/home/.conda/envs/wgdV2/lib/python3.6/site-packages/cli.py", line 457, in _ksd from wgd.core import get_gene_families, SequenceData, KsDistributionBuilder ModuleNotFoundError: No module named 'wgd.core'

Thanks you

Bio1nform commented 10 months ago

With python3.8:

wgd dmd --globalmrbh Aquilegia_coerulea Protea_cynaroides Acorus_americanus Vitis_vinifera -o wgd_globalmrbh Works fine

Rest does not work

wgd ksd wgd_globalmrbh_CHECK/global_MRBH.tsv --extraparanomeks wgd_ksd/Aquilegia_coerulea.tsv.ks.tsv -sp speciestree.nw --reweight -o wgd_globalmrbh_ks1 --spair "Aquilegia_coerulea;Protea_cynaroides" --spair "Aquilegia_coerulea;Vitis_vinifera" --spair "Aquilegia_coerulea;Acorus_americanus" --spair "Aquilegia_coerulea;Aquilegia_coerulea" --plotkde Aquilegia_coerulea Protea_cynaroides Acorus_americanus Vitis_vinifera

Thank you.

myjob.error.txt

heche-psb commented 10 months ago

The error occurred at the this step kde = stats.gaussian_kde(y,weights=w,bw_method=0.1), which returned ValueError: array must not contain infs or NaNs. The problem is between the Ks file, species pairs and the species tree. I guess it might be due to some incorrect inputs. Could you please share me with all your input and the used full command. Thanks!

Bio1nform commented 10 months ago

Hi, I ran the following steps. Here are the sequence files: Sequences1.tar.gz Sequences2.tar.gz

1) wgd ksd

wgd_dmd/Aquilegia_coerulea.tsv Aquilegia_coerulea wgd_ksd.zip

2) wgd dmd --g lobalmrbh Aquilegia_coerulea Protea_cynaroides Acorus_americanus Vitis_vinifera -o wgd_globalmrbh

wgd_globalmrbh.zip

3) wgd ksd wgd_globalmrbh/global_MRBH.tsv --extraparanomeks wgd_ksd/Aquilegia_coerulea.tsv.ks.tsv -sp speciestree.nw --reweight -o wgd_globalmrbh_ks1 --spair "Aquilegia_coerulea;Protea_cynaroides" --spair "Aquilegia_coerulea;Vitis_vinifera" --spair "Aquilegia_coerulea;Acorus_americanus" --spair "Aquilegia_coerulea;Aquilegia_coerulea" --plotkde Aquilegia_coerulea Protea_cynaroides Acorus_americanus Vitis_vinifera

I get the error message.

Thanks

Bio1nform commented 10 months ago

Hi,

Were you able to take a look into it?

Thanks,

heche-psb commented 10 months ago

Hi, yes, there was a small bug concerning the node-averaged Ks processing. I just fixed it and pushed it as a v2.0.21. Please try again and let me know if the same error occurred again. Thanks a lot!

Bio1nform commented 10 months ago

Hi, I am still getting error. job.error.txt

heche-psb commented 10 months ago

Hi, I think you're using wgd ksd to do the rate correction. Could you use wgd ksd to only infer Ks while using wgd viz to do the rate correction?

Bio1nform commented 10 months ago

Thanks it seems to work, i will let you know if there is any issue in this.

However, wgd dmd --globalmrbh i think it cannot handle sequence larger than 4kb. Is there any way i can increase the size? I am not sure what is the reason though.

I get the following error.

error1.txt

heche-psb commented 10 months ago

Could you please share me with the sequence files that you used? It seems to be a format problem of the input cds files.

Bio1nform commented 10 months ago

I figured the error. It was because of duplicated fasta IDs. Is there any way to fix this issue?

How ever, some new error. wgd ksd wgd_dmd/Aquilegia_coerulea.tsv Aquilegia_coerulea -o wgd_ksd gives me following error. error2.txt

heche-psb commented 10 months ago

The problem occurred at the family GF00000004. Could you please share me with the cds files of only GF00000004.

Bio1nform commented 10 months ago

Here you go. GF00000004 AQUCO_02000225v1_22779, AQUCO_00700295v1_9186, AQUCO_02000226v1_22780, AQUCO_44500001v1_40958, AQUCO_02000219v1_22773, AQUCO_00201007v1_3153, AQUCO_02200206v1_24148, AQUCO_04400082v1_33100, AQUCO_01000140v1_12868, AQUCO_01000578v1_13622, AQUCO_00700296v1_9187, AQUCO_00200174v1_1785, AQUCO_00700480v1_9470, AQUCO_01000288v1_13135, AQUCO_00200153v1_1743, AQUCO_00900240v1_11191, AQUCO_00200176v1_1787, AQUCO_01500005v1_18737, AQUCO_04700045v1_33731, AQUCO_01700324v1_20762, AQUCO_00700388v1_9339, AQUCO_01700327v1_20765, AQUCO_00700386v1_9337, AQUCO_02600114v1_25914, AQUCO_00500170v1_7139, AQUCO_00900126v1_11012, AQUCO_00200172v1_1783, AQUCO_03500197v1_30294, AQUCO_03900098v1_31723, AQUCO_02200275v1_24279, AQUCO_00300143v1_4243, AQUCO_03000303v1_28394, AQUCO_00200152v1_1742, AQUCO_00200155v1_1745, AQUCO_00400487v1_6293, AQUCO_29600001v1_40929, AQUCO_00300388v1_4628, AQUCO_00300391v1_4632, AQUCO_29600002v1_40930, AQUCO_04700046v1_33732, AQUCO_00200173v1_1784, AQUCO_00300142v1_4242, AQUCO_00200311v1_2025, AQUCO_11400002v1_40231, AQUCO_00200175v1_1786, AQUCO_04900044v1_33977, AQUCO_02300181v1_24692, AQUCO_00500129v1_7067, AQUCO_02300178v1_24689, AQUCO_01600057v1_19597, AQUCO_02300171v1_24678, AQUCO_02300176v1_24687, AQUCO_02300176v1_24686, AQUCO_02600115v1_25915, AQUCO_02300180v1_24691, AQUCO_00500071v1_6981, AQUCO_02800270v1_27588, AQUCO_00300386v1_4626, AQUCO_01600365v1_20054, AQUCO_00300389v1_4629, AQUCO_02500191v1_25430, AQUCO_00500071v1_6982, AQUCO_02900102v1_27824, AQUCO_02400108v1_24976, AQUCO_00100892v1_1485, AQUCO_01700027v1_20240, AQUCO_03700288v1_31162, AQUCO_07500004v1_37851, AQUCO_00500071v1_6983, AQUCO_00100478v1_811, AQUCO_00700290v1_9176, AQUCO_00300387v1_4627, AQUCO_07500004v1_37852, AQUCO_03700288v1_31161, AQUCO_00700293v1_9182, AQUCO_02300179v1_24690, AQUCO_00500071v1_6984, AQUCO_02300173v1_24683

GF00000004.txt

heche-psb commented 10 months ago

I can actually run this family through successfully with command wgd ksd famback.tsv GF00000004.cds. Were you using PAML v4.9j and did you add other parameters?

famback tsv ksd

Bio1nform commented 10 months ago

I am using paml 4.9. i tried wgd ksd GF0000000.tsv GF00000004 -o TEST i get the following error.

(WGDV2_38) geno@farm:\~/WGD/Aquilegia$ which codeml /home/software/GENOMETOOLS/PAML/paml4.9j/bin/codeml (WGDV2_38) geno@farm:~/WGD/Aquilegia$ wgd ksd GF0000000.tsv GF00000004 -o TEST 09:40:34 INFO This is wgd v2.0.21 cli.py:32 09:40:36 INFO tmpdir = wgdtmp_f2922255-bc2a-445e-826b-0fe0ce138647 cli.py:483 Traceback (most recent call last): File "/home/.conda/envs/WGDV2_38/bin/wgd", line 10, in sys.exit(cli()) File "/home/.conda/envs/WGDV2_38/lib/python3.8/site-packages/click/core.py", line 829, in call return self.main(args, kwargs) File "/home/.conda/envs/WGDV2_38/lib/python3.8/site-packages/click/core.py", line 782, in main rv = self.invoke(ctx) File "/home/.conda/envs/WGDV2_38/lib/python3.8/site-packages/click/core.py", line 1259, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/.conda/envs/WGDV2_38/lib/python3.8/site-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/.conda/envs/WGDV2_38/lib/python3.8/site-packages/click/core.py", line 610, in invoke return callback(args, kwargs) File "/home/.conda/envs/WGDV2_38/lib/python3.8/site-packages/cli.py", line 464, in ksd _ksd(kwargs) File "/home/.conda/envs/WGDV2_38/lib/python3.8/site-packages/cli.py", line 490, in _ksd ksdb.get_distribution() File "/home/.conda/envs/WGDV2_38/lib/python3.8/site-packages/wgd/core.py", line 3030, in get_distribution df = pd.concat([pd.read_csv(x.out, index_col=0) File "/home/.conda/envs/WGDV2_38/lib/python3.8/site-packages/pandas/util/_decorators.py", line 311, in wrapper return func(*args, **kwargs) File "/home/.conda/envs/WGDV2_38/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 347, in concat op = _Concatenator( File "/home/.conda/envs/WGDV2_38/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 404, in init raise ValueError("No objects to concatenate") ValueError: No objects to concatenate

heche-psb commented 10 months ago

I used another dataset and reproduced this error. It's because of the error in codeml "166 columns are converted into ??? because of stop codons. 21 out of 21 sequences do not have any resolved nucleotides. Giving up." It's the sequence that is not a strict cds and contains many in-frame stop codons so despite the stripped alignment length is not zero, no codeml result is returned. I just pushed a fixed commit so it should be solved now.

Bio1nform commented 10 months ago

Did you push it to new version? I re-installed wgd2==2.0.21. I am still getting same error.

(wgdV2_38) geno@farm:~/WGD/Aquilegia$ wgd ksd test.tsv GF00000004 -o TEST gives me the same error.

adding --cds gives me following error:

(wgdV2_38) geno@farm:~/WGD/Aquilegia$ wgd ksd --cds test.tsv GF00000004 -o TEST 07:50:05 INFO This is wgd v2.0.21 cli.py:32 07:50:06 WARNING Translation error (First codon 'AAG' is not a start codon) in seq AQUCO_00100478v1_811 core.py:282 WARNING Translation error (First codon 'TTA' is not a start codon) in seq AQUCO_00200173v1_1784 core.py:282 WARNING Translation error (First codon 'TTA' is not a start codon) in seq AQUCO_00200175v1_1786 core.py:282 WARNING Translation error (Final codon 'GTT' is not a stop codon) in seq AQUCO_00300391v1_4632 core.py:282 WARNING Translation error (First codon 'GCC' is not a start codon) in seq AQUCO_04900044v1_33977 core.py:282 WARNING Translation error (Final codon 'ACT' is not a stop codon) in seq AQUCO_29600002v1_40930 core.py:282 INFO tmpdir = wgdtmp_aaa2e529-da43-4116-b682-7bb00d94135e cli.py:483 Traceback (most recent call last): File "/home/.conda/envs/wgdV2_38/bin/wgd", line 10, in sys.exit(cli()) File "/home/.conda/envs/wgdV2_38/lib/python3.8/site-packages/click/core.py", line 829, in call return self.main(args, kwargs) File "/home/.conda/envs/wgdV2_38/lib/python3.8/site-packages/click/core.py", line 782, in main rv = self.invoke(ctx) File "/home/.conda/envs/wgdV2_38/lib/python3.8/site-packages/click/core.py", line 1259, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/.conda/envs/wgdV2_38/lib/python3.8/site-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/.conda/envs/wgdV2_38/lib/python3.8/site-packages/click/core.py", line 610, in invoke return callback(args, kwargs) File "/home/.conda/envs/wgdV2_38/lib/python3.8/site-packages/cli.py", line 464, in ksd _ksd(kwargs) File "/home/.conda/envs/wgdV2_38/lib/python3.8/site-packages/cli.py", line 490, in _ksd ksdb.get_distribution() File "/home/.conda/envs/wgdV2_38/lib/python3.8/site-packages/wgd/core.py", line 3030, in get_distribution df = pd.concat([pd.read_csv(x.out, index_col=0) File "/home/.conda/envs/wgdV2_38/lib/python3.8/site-packages/pandas/util/_decorators.py", line 311, in wrappe r return func(*args, **kwargs) File "/home/.conda/envs/wgdV2_38/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 347, in con cat op = _Concatenator( File "/home/.conda/envs/wgdV2_38/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 404, in i nit raise ValueError("No objects to concatenate") ValueError: No objects to concatenate

heche-psb commented 10 months ago

Hi, you need to download and install the version from this github repository. I fixed some bugs later after the PYPI version v2.0.21. Sorry for the confusion.

Bio1nform commented 10 months ago

Hi,

Others are working now. I am still getting error when i add --plotapgmm.

wgd viz -d wgd_globalmrbh_ks/global_MRBH.tsv.ks.tsv --extraparanomeks wgd_ksd/Aquilegia_coerulea.tsv.ks.tsv -sp speciestree.nw --reweight -ap wgd_syn/iadhore-out/anchorpoints.txt -o wgd_viz_mixed_Ks_elmm2 --spair "Aquilegia_coerulea;Protea_cynaroides" --spair "Aquilegia_coerulea;Vitis_vinifera" --spair "Aquilegia_coerulea;Acorus_americanus" --spair "Aquilegia_coerulea;Aquilegia_coerulea" --gsmap wgd_globalmrbh_ks/gene_species.map --plotkde --plotelmm --plotapgmm

Traceback (most recent call last): File "/home/.conda/envs/wgdV2_38/bin/wgd", line 10, in sys.exit(cli()) File "/home/.conda/envs/wgdV2_38/lib/python3.8/site-packages/click/core.py", line 829, in call return self.main(args, kwargs) File "/home/.conda/envs/wgdV2_38/lib/python3.8/site-packages/click/core.py", line 782, in main rv = self.invoke(ctx) File "/home/.conda/envs/wgdV2_38/lib/python3.8/site-packages/click/core.py", line 1259, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/.conda/envs/wgdV2_38/lib/python3.8/site-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/.conda/envs/wgdV2_38/lib/python3.8/site-packages/click/core.py", line 610, in invoke return callback(args, kwargs) File "/home/.conda/envs/wgdV2_38/lib/python3.8/site-packages/cli.py", line 554, in viz _viz(kwargs) File "/home/.conda/envs/wgdV2_38/lib/python3.8/site-packages/cli.py", line 594, in _viz multi_sp_plot(df,spair,gsmap,outdir,onlyrootout,title=prefix,ylabel=ylabel,viz=True,plotkde=plotkde,reweight=False,sptree=speciestree,ap = anchorpoints, extraparanomeks=extraparanomeks,plotapgmm=plotapgmm,plotelmm=plotelmm,components=components,max_EM_iterations=em_iterations,num_EM_initializations=em_initializations,peak_threshold=prominence_cutoff,rel_height=rel_height, na=True,user_xlim=xlim,user_ylim=ylim) File "/home/.conda/envs/wgdV2_38/lib/python3.8/site-packages/wgd/viz.py", line 665, in multi_sp_plot if plotapgmm: ax = addapgmm(ax,y,w,components,outdir,Hs) File "/home/.conda/envs/wgdV2_38/lib/python3.8/site-packages/wgd/viz.py", line 338, in addapgmm models, aic, bic, besta, bestb, N = fit_gmm(aic_bic_fplot, X_log, 2352890, components[0], components[1], em_iter=200, n_init=200) File "/home/.conda/envs/wgdV2_38/lib/python3.8/site-packages/wgd/viz.py", line 233, in fit_gmm models[i-n1] = mixture.GaussianMixture(n_components = i, covariance_type='full', max_iter = em_iter, n_init = n_init, random_state = seed).fit(X) File "/home/.conda/envs/wgdV2_38/lib/python3.8/site-packages/sklearn/mixture/_base.py", line 193, in fit self.fit_predict(X, y) File "/home/.conda/envs/wgdV2_38/lib/python3.8/site-packages/sklearn/mixture/_base.py", line 220, in fit_predict X = _check_X(X, self.n_components, ensure_min_samples=2) File "/home/.conda/envs/wgdV2_38/lib/python3.8/site-packages/sklearn/mixture/_base.py", line 52, in _check_X X = check_array(X, dtype=[np.float64, np.float32], File "/home/.conda/envs/wgdV2_38/lib/python3.8/site-packages/sklearn/utils/validation.py", line 73, in inner_f return f(**kwargs) File "/home/.conda/envs/wgdV2_38/lib/python3.8/site-packages/sklearn/utils/validation.py", line 651, in check_array raise ValueError("Found array with %d sample(s) (shape=%s) while a" ValueError: Found array with 0 sample(s) (shape=(0, 1)) while a minimum of 2 is required.

Thanks

heche-psb commented 10 months ago

Hi, could you first check if the gene ids in wgd_ksd/Aquilegia_coerulea.tsv.ks.tsv match exactly with the gene ids in wgd_syn/iadhore-out/anchorpoints.txt. It seems the anchor Ks data is not properly extracted as expected.

Bio1nform commented 9 months ago

Hi, I am getting this error with the .22 version.

wgd syn -f mRNA -a Name wgd_dmd/Aquilegia_coerulea.tsv Aquilegia_coerulea.gff3 -ks wgd_ksd/Aquilegia_coerulea.tsv.ks.tsv -o wgd_sync

File "/home/.conda/envs/wgdV2_38/bin/wgd", line 10, in sys.exit(cli()) File "/home/.conda/envs/wgdV2_38/lib/python3.8/site-packages/click/core.py", line 829, in call return self.main(args, kwargs) File "/home/.conda/envs/wgdV2_38/lib/python3.8/site-packages/click/core.py", line 782, in main rv = self.invoke(ctx) File "/home/.conda/envs/wgdV2_38/lib/python3.8/site-packages/click/core.py", line 1259, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/.conda/envs/wgdV2_38/lib/python3.8/site-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/.conda/envs/wgdV2_38/lib/python3.8/site-packages/click/core.py", line 610, in invoke return callback(args, kwargs) File "/home/.conda/envs/wgdV2_38/lib/python3.8/site-packages/cli.py", line 641, in syn _syn(kwargs) File "/home/.conda/envs/wgdV2_38/lib/python3.8/site-packages/cli.py", line 699, in _syn ksdb_df = formatv2(ksdb_df) File "/home/.conda/envs/wgdV2_38/lib/python3.8/site-packages/wgd/utils.py", line 27, in formatv2 if "weightoutlierexcluded" not in ksdf.columns: weight_inc = get_outlierincluded(ksdf) File "/home/.conda/envs/wgdV2_38/lib/python3.8/site-packages/wgd/utils.py", line 48, in get_outlierincluded weight_inc = 1/df.groupby(['family', 'node'])['dS'].transform('count') File "/home/.conda/envs/wgdV2_38/lib/python3.8/site-packages/pandas/core/frame.py", line 7721, in groupby return DataFrameGroupBy( File "/home/.conda/envs/wgdV2_38/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 882, in init grouper, exclusions, obj = get_grouper( File "/home/.conda/envs/wgdV2_38/lib/python3.8/site-packages/pandas/core/groupby/grouper.py", line 882, in get_grouper raise KeyError(gpr) KeyError: 'node'

Thanks

heche-psb commented 9 months ago

Could you show me the column name of the file wgd_ksd/Aquilegia_coerulea.tsv.ks.tsv? Was it produced from v1 or v2?

Bio1nform commented 9 months ago

pair family g1 g2 gene1 gene2 Aqcoe1G119100.1Aqcoe3G278800.1 GF00000001 Aquilegia_coerulea_40904 Aquilegia_coerulea_26689 Aqcoe3G278800.1 Aqcoe1G119100.1 Aqcoe2G086700.1Aqcoe3G278800.1 GF00000001 Aquilegia_coerulea_40904 Aquilegia_coerulea_04209 Aqcoe3G278800.1 Aqcoe2G086700.1

This was produced from wgd2==2.0.22

heche-psb commented 9 months ago

It seems that there is no Ks results in your wgd_ksd/Aquilegia_coerulea.tsv.ks.tsv. Your run of last step using wgd ksd is problematic. May I have a look of the log file of the wgd ksd step that produced this file wgd_ksd/Aquilegia_coerulea.tsv.ks.tsv?

Bio1nform commented 9 months ago

The wgd2==2.0.21 works, Its the wgd2==2.0.22 that shows error.

Wgd2==2.0.21 error. myjob. wgd2==2.0.21.txt

Wgd2==2.0.22 error. myjoberror wgd2==2.0.22.txt

heche-psb commented 9 months ago

Hi, it's a bug in 2.0.22. I fixed it already in this repository. The fixed version is on 2.0.23 now. Sorry for the confusion.

Bio1nform commented 9 months ago

Thanks, I will try it. Is it updated on conda version too??

heche-psb commented 9 months ago

The update on conda will be a bit late. It's on PYPI now.

Bio1nform commented 9 months ago

I get issues installing (PYPI) in cluster. Installing Numpy, fastcluster give me issues.

pip install numpy==1.19.0 Collecting numpy==1.19.0 Using cached numpy-1.19.0.zip (7.3 MB) Installing build dependencies ... done Getting requirements to build wheel ... done Preparing metadata (pyproject.toml) ... error error: subprocess-exited-with-error

× Preparing metadata (pyproject.toml) did not run successfully. │ exit code: 1 ╰─> [58 lines of output] Running from numpy source directory.

:460: UserWarning: Unrecognized setuptools command, proceeding with generating Cython sources and expanding templates /scratch/tmp/pip-install-h122amxc/numpy_01c3c1595190499da8881a392ed6b0de/tools/cythonize.py:73: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. required_version = LooseVersion('0.29.14') /scratch/tmp/pip-install-h122amxc/numpy_01c3c1595190499da8881a392ed6b0de/tools/cythonize.py:75: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. if LooseVersion(cython_version) < required_version: warning: /scratch/tmp/pip-install-h122amxc/numpy_01c3c1595190499da8881a392ed6b0de/numpy/__init__.pxd:17:0: The 'DEF' statement is deprecated and will be removed in a future Cython version. Consider using global variables, constants, and in-place literals instead. See https://github.com/cython/cython/issues/4310 warning: _philox.pyx:19:0: The 'DEF' statement is deprecated and will be removed in a future Cython version. Consider using global variables, constants, and in-place literals instead. See https://github.com/cython/cython/issues/4310 warning: /scratch/tmp/pip-install-h122amxc/numpy_01c3c1595190499da8881a392ed6b0de/numpy/__init__.pxd:17:0: The 'DEF' statement is deprecated and will be removed in a future Cython version. Consider using global variables, constants, and in-place literals instead. See https://github.com/cython/cython/issues/4310 Error compiling Cython file: ------------------------------------------------------------ ... self.rng_state.ctr.v[i] = counter[i] self._reset_state_variables() self._bitgen.state = &self.rng_state self._bitgen.next_uint64 = &philox_uint64 ^ ------------------------------------------------------------ _philox.pyx:195:35: Cannot assign type 'uint64_t (*)(void *) except? -1 nogil' to 'uint64_t (*)(void *) noexcept nogil'. Exception values are incompatible. Suggest adding 'noexcept' to type 'uint64_t (void *) except? -1 nogil'. Processing numpy/random/_bounded_integers.pxd.in Processing numpy/random/mtrand.pyx Processing numpy/random/_philox.pyx Traceback (most recent call last): File "/scratch/tmp/pip-install-h122amxc/numpy_01c3c1595190499da8881a392ed6b0de/tools/cythonize.py", line 235, in main() File "/scratch/tmp/pip-install-h122amxc/numpy_01c3c1595190499da8881a392ed6b0de/tools/cythonize.py", line 231, in main find_process_files(root_dir) File "/scratch/tmp/pip-install-h122amxc/numpy_01c3c1595190499da8881a392ed6b0de/tools/cythonize.py", line 222, in find_process_files process(root_dir, fromfile, tofile, function, hash_db) File "/scratch/tmp/pip-install-h122amxc/numpy_01c3c1595190499da8881a392ed6b0de/tools/cythonize.py", line 188, in process processor_function(fromfile, tofile) File "/scratch/tmp/pip-install-h122amxc/numpy_01c3c1595190499da8881a392ed6b0de/tools/cythonize.py", line 77, in process_pyx subprocess.check_call( File "/usr/lib/python3.10/subprocess.py", line 369, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['/home/software/wgd/ENV/bin/python', '-m', 'cython', '-3', '--fast-fail', '-o', '_philox.c', '_philox.pyx']' returned non-zero exit status 1. Cythonizing sources Traceback (most recent call last): File "/home/software/wgd/ENV/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 363, in main() File "/home/software/wgd/ENV/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 345, in main json_out['return_val'] = hook(**hook_input['kwargs']) File "/home/software/wgd/ENV/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 164, in prepare_metadata_for_build_wheel return hook(metadata_directory, config_settings) File "/scratch/tmp/pip-build-env-jn2635rn/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 396, in prepare_metadata_for_build_wheel self.run_setup() File "/scratch/tmp/pip-build-env-jn2635rn/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 507, in run_setup super(_BuildMetaLegacyBackend, self).run_setup(setup_script=setup_script) File "/scratch/tmp/pip-build-env-jn2635rn/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 341, in run_setup exec(code, locals()) File "", line 489, in File "", line 469, in setup_package File "", line 274, in generate_cython RuntimeError: Running cythonize failed! [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. error: metadata-generation-failed × Encountered error while generating package metadata. ╰─> See above for output. note: This is an issue with the package mentioned above, not pip. hint: See above for details. Installed cython, still similar issue. Conda only works for me.
heche-psb commented 9 months ago

Hi, please try with python3.8. python3.10 is not compatible for now.

Bio1nform commented 9 months ago

This error:

python setup.py install Traceback (most recent call last): File "setup.py", line 5, in from setuptools import setup File "/home/wgd/ENV/lib/python3.8/site-packages/setuptools/init.py", line 19, in from setuptools.dist import Distribution File "/home/wgd/ENV/lib/python3.8/site-packages/setuptools/dist.py", line 34, in from setuptools import windows_support File "/home/wgd/ENV/lib/python3.8/site-packages/setuptools/windows_support.py", line 2, in import ctypes File "/home/software/Python/Python-3.8.5/Lib/ctypes/init.py", line 7, in from _ctypes import Union, Structure, Array ImportError: libffi.so.6: cannot open shared object file: No such file or directory

heche-psb commented 9 months ago

Hi, you may try with sudo apt-get install libffi-dev or update.

Bio1nform commented 9 months ago

I am working in cluster and do not have the admin privilege. The previous error was with: python setup.py install

With : python -m pip install -r requirements.txt I get following error.

WARNING: pip is configured with locations that require TLS/SSL, however the ssl module in Python is not available. WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError("Can't connect to HTTPS URL because the SSL module is not available.")': /simple/attrs/ WARNING: Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError("Can't connect to HTTPS URL because the SSL module is not available.")': /simple/attrs/ WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError("Can't connect to HTTPS URL because the SSL module is not available.")': /simple/attrs/ WARNING: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError("Can't connect to HTTPS URL because the SSL module is not available.")': /simple/attrs/ WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError("Can't connect to HTTPS URL because the SSL module is not available.")': /simple/attrs/ Could not fetch URL https://pypi.org/simple/attrs/: There was a problem confirming the ssl certificate: HTTPSConnectionPool(host='pypi.org', port=443): Max retries exceeded with url: /simple/attrs/ (Caused by SSLError("Can't connect to HTTPS URL because the SSL module is not available.")) - skipping ERROR: Could not find a version that satisfies the requirement attrs==20.3.0 (from -r requirements.txt (line 1)) (from versions: none) ERROR: No matching distribution found for attrs==20.3.0 (from -r requirements.txt (line 1)) WARNING: pip is configured with locations that require TLS/SSL, however the ssl module in Python is not available. Could not fetch URL https://pypi.org/simple/pip/: There was a problem confirming the ssl certificate: HTTPSConnectionPool(host='pypi.org', port=443): Max retries exceeded with url: /simple/pip/ (Caused by SSLError("Can't connect to HTTPS URL because the SSL module is not available.")) - skipping

heche-psb commented 9 months ago

Above you were installing from the cloned repository right? Normally python3.8 works. Did you install with virtual environment?

Bio1nform commented 9 months ago

This error is from the virtual environment.

heche-psb commented 9 months ago

The error messages indicate there are issues with the installed python which should have SSL support but apparently not. This is not a problem of wgd v2 itself I guess.

Bio1nform commented 9 months ago

No, not the issue with wgd V2. I am trying to install the V==2.1.23 as of now. Once i install i will check for the bug that you have updated. How long will the conda update take? conda version works best for me.

Bio1nform commented 9 months ago

I installed the V==2.0.23. in PYPI.

wgd ksd wgd_dmd/Aquilegia_coerulea.tsv Aquilegia_coerulea -o wgd_ksd

15:20:53 INFO This is wgd v2.0.23 cli.py:32 15:21:10 INFO tmpdir = cli.py:483 wgdtmp_adab2f7f-fb6b-407e-8083-5643b9b4a9fc
15:21:14 INFO Analysing family GF00000001 core.py:2873 15:21:14 INFO Analysing family GF00000002 core.py:2873 15:21:14 INFO Analysing family GF00000003 core.py:2873 15:21:14 INFO Analysing family GF00000004 core.py:2873 15:21:15 INFO Analysing family GF00000005 core.py:2873

Now i get the following error. error.txt