mgalardini / pyseer

SEER, reimplemented in python 🐍🔮
http://pyseer.readthedocs.io
Apache License 2.0
104 stars 25 forks source link

ValueError: zero-size array to reduction operation maximum which has no identity #128

Closed reedus-123 closed 3 years ago

reedus-123 commented 3 years ago

Hello,

I tried to run the following command to calculate results from a merged vcf file with the --no-distances parameter (as an alternative to --load-m, which yielded perfectly separable data). This returned the following error:

pyseer --phenotypes phenotype.pheno --vcf snp.merged.vcf --no-distances --lineage --print-samples > my_SNPs.txt Read 28 phenotypes Detected binary phenotype Writing lineage effects to lineage_effects.txt [W::bcf_hdr_check_sanity] GL should be declared as Number=G Traceback (most recent call last): File "/install/software/mincondas/3.7/pyseer/bin/pyseer", line 10, in <module> sys.exit(main()) File "/install/software/mincondas/3.7/pyseer/lib/python3.7/site-packages/pyseer/__main__.py", line 759, in main ret = fixed_effects_regression(*data) File "/install/software/mincondas/3.7/pyseer/lib/python3.7/site-packages/pyseer/model.py", line 350, in fixed_effects_regression max_lineage = fit_lineage_effect(lin, c, k) File "/install/software/mincondas/3.7/pyseer/lib/python3.7/site-packages/pyseer/model.py", line 172, in fit_lineage_effect lineage_mod = smf.Logit(k, X) File "/install/software/mincondas/3.7/pyseer/lib/python3.7/site-packages/statsmodels/discrete/discrete_model.py", line 456, in __init__ super().__init__(endog, exog, check_rank, **kwargs) File "/install/software/mincondas/3.7/pyseer/lib/python3.7/site-packages/statsmodels/discrete/discrete_model.py", line 175, in __init__ super().__init__(endog, exog, **kwargs) File "/install/software/mincondas/3.7/pyseer/lib/python3.7/site-packages/statsmodels/base/model.py", line 237, in __init__ super(LikelihoodModel, self).__init__(endog, exog, **kwargs) File "/install/software/mincondas/3.7/pyseer/lib/python3.7/site-packages/statsmodels/base/model.py", line 78, in __init__ **kwargs) File "/install/software/mincondas/3.7/pyseer/lib/python3.7/site-packages/statsmodels/base/model.py", line 101, in _handle_data data = handle_data(endog, exog, missing, hasconst, **kwargs) File "/install/software/mincondas/3.7/pyseer/lib/python3.7/site-packages/statsmodels/base/data.py", line 673, in handle_data **kwargs) File "/install/software/mincondas/3.7/pyseer/lib/python3.7/site-packages/statsmodels/base/data.py", line 87, in __init__ self._handle_constant(hasconst) File "/install/software/mincondas/3.7/pyseer/lib/python3.7/site-packages/statsmodels/base/data.py", line 131, in _handle_constant exog_max = np.max(self.exog, axis=0) File "<__array_function__ internals>", line 6, in amax File "/install/software/mincondas/3.7/pyseer/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 2706, in amax keepdims=keepdims, initial=initial, where=where) File "/install/software/mincondas/3.7/pyseer/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 87, in _wrapreduction return ufunc.reduce(obj, axis, dtype, out, **passkwargs) ValueError: zero-size array to reduction operation maximum which has no identity

Any help would be appreciated. Thank you.

mgalardini commented 3 years ago

Hi, could you send me your input files please? I would need the phenotype.pheno file, the first few lines of the snp.merged.vcf file (say, the first 5000 lines) and maybe whatever you were using as a distance matrix. This way I can more quickly inspect this error. Thanks!

reedus-123 commented 3 years ago

Hi, thanks for your quick response!

I have attached the snp.merged.vcf header and .pheno file below.

merged.txt phenotype.txt

mgalardini commented 3 years ago

Hi, sorry but I can't seem to be able to bgzip and index your VCF file:

$ cp merged.txt merged.vcf
$ bgzip -c merged.vcf > merged.vcf.gz
$ tabix -p vcf merged.vcf.gz
[E::get_intv] Failed to parse TBX_VCF, was wrong -p [type] used?
The offending line was: "AL111168.1      44   .  G       A       222     .       VDB=0.150727;AF1=1;AC1=2;MQ=60;FQ=-282;RPB=-0.905927;PV4=1,0.38,1,0.26;DP=17679;DP4=1,0,15659,1434;AN=104;AC=104        GT:PL:GQ        ./.:.:. ./.:.:.    ./.:.:. ./.:.:. ./.:.:. ./.:.:. ./.:.:. ./.:.:. 1/1:255,255,0:99        ./.:.:. 1/1:255,255,0:99        ./.:.:. ./.:.:. 1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99   1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99  1/1:255,255,0:99 1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99  1/1:255,255,0:99 1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99  1/1:255,255,0:99 1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99  1/1:255,255,0:99 1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99  1/1:255,255,0:99 1/1:255,255,0:99"
[E::get_intv] Failed to parse TBX_VCF, was wrong -p [type] used?
The offending line was: "AL111168.1      45   .  T       C       222     .       VDB=0.515746;AF1=1;AC1=2;MQ=60;FQ=-282;RPB=1.7221;PV4=1,1,0.48,1;DP=21500;DP4=2,0,19212,1491;AN=126;AC=126      GT:PL:GQ        1/1:255,255,0:99  1/1:255,255,0:99 1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99  1/1:255,255,0:99 1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99  1/1:255,255,0:99 1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99  1/1:255,255,0:99 1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99  1/1:255,255,0:99 1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99  1/1:255,255,0:99 1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99  1/1:255,255,0:99 1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99        1/1:255,255,0:99"
[E::hts_idx_push] Unsorted positions on sequence #1: 1 followed by 0
tbx_index_build failed: merged.vcf.gz

Does it work for you?

reedus-123 commented 3 years ago

It does for me, yes. I use bcftools:

$ bcftools index merged.vcf
$ bgzip merged.vcf

The first command generates a .csi index file and then I zip the vcf file using bgzip.

mgalardini commented 3 years ago

Hi, sorry but that does not seem to work for me:

$ bcftools index merged.vcf 
index: the file is not BGZF compressed, cannot index: merged.vcf

I am using bcftools 1.9.

At any rate, can you provide another example set that reproduces the error please?

gooodle commented 9 months ago

Hi mgalardini, is the issue solved? I am trying the tool, and received exactly the same error. Thanks.

mgalardini commented 9 months ago

So my assumption is that the problem lays outside of pyseer. If you can provide an example dataset that triggers the error I can try to reproduce it again

gooodle commented 9 months ago

So my assumption is that the problem lays outside of pyseer. If you can provide an example dataset that triggers the error I can try to reproduce it again

Thank you very much, mgalardini.

The attached please find the pheno and vcf (head 500) data I tested. The command is pyseer --phenotypes $phenotype --vcf $vcff --no-distances --lineage --print-samples > test_SNPs.txt

I don't have a distance matrix so I used --no-distance option. Actually, when I used your demo data snps.vcf.gz and resistances.pheno, the save error was reproduced. It looks a distance matrix must be included? does it accept a .nwk tree generated by hclust? or how can I run without it?

The whole error info is quoted below: Read 73 phenotypes Detected binary phenotype Writing lineage effects to lineage_effects.txt Traceback (most recent call last): File "/miniconda3/envs/pyseer/bin/pyseer", line 10, in sys.exit(main()) File "/miniconda3/envs/pyseer/lib/python3.10/site-packages/pyseer/main.py", line 798, in main ret = fixed_effects_regression(*data) File "/miniconda3/envs/pyseer/lib/python3.10/site-packages/pyseer/model.py", line 380, in fixed_effects_regression max_lineage = fit_lineage_effect(lin, c, k) File "/miniconda3/envs/pyseer/lib/python3.10/site-packages/pyseer/model.py", line 181, in fit_lineage_effect lineage_mod = smf.Logit(k, X) File "/miniconda3/envs/pyseer/lib/python3.10/site-packages/statsmodels/discrete/discrete_model.py", line 475, in init super().init(endog, exog, offset=offset, check_rank=check_rank, File "/miniconda3/envs/pyseer/lib/python3.10/site-packages/statsmodels/discrete/discrete_model.py", line 185, in init super().init(endog, exog, kwargs) File "/miniconda3/envs/pyseer/lib/python3.10/site-packages/statsmodels/base/model.py", line 270, in init super().init(endog, exog, kwargs) File "/miniconda3/envs/pyseer/lib/python3.10/site-packages/statsmodels/base/model.py", line 95, in init self.data = self._handle_data(endog, exog, missing, hasconst, File "/miniconda3/envs/pyseer/lib/python3.10/site-packages/statsmodels/base/model.py", line 135, in _handle_data data = handle_data(endog, exog, missing, hasconst, kwargs) File "/miniconda3/envs/pyseer/lib/python3.10/site-packages/statsmodels/base/data.py", line 675, in handle_data return klass(endog, exog=exog, missing=missing, hasconst=hasconst, File "/miniconda3/envs/pyseer/lib/python3.10/site-packages/statsmodels/base/data.py", line 88, in init self._handle_constant(hasconst) File "/miniconda3/envs/pyseer/lib/python3.10/site-packages/statsmodels/base/data.py", line 132, in _handle_constant exog_max = np.max(self.exog, axis=0) File "/miniconda3/envs/pyseer/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 2810, in max return _wrapreduction(a, np.maximum, 'max', axis, None, out, File "/miniconda3/envs/pyseer/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 88, in _wrapreduction return ufunc.reduce(obj, axis, dtype, out, passkwargs) ValueError: zero-size array to reduction operation maximum which has no identity

Y35M055.pheno.txt

allContigs_for_pyseer_head500.vcf.csv

mgalardini commented 9 months ago

So, if you omit the --lineage option the command executes fine. To have the program find lineage effects you have to provide your own lineages using the --lineage-clusters argument. By default the distance matrix is used to define lineages but in this case you are not providing such matrix. Please look into the tutorial to see how tot generate one.

gooodle commented 9 months ago

So, if you omit the --lineage option the command executes fine. To have the program find lineage effects you have to provide your own lineages using the --lineage-clusters argument. By default the distance matrix is used to define lineages but in this case you are not providing such matrix. Please look into the tutorial to see how tot generate one.

Perfect, thanks a lot for your time and support, mgalardini.