hardingnj / xpclr

Code to compute the XP-CLR statistic to infer natural selection
MIT License
85 stars 26 forks source link

UserWarning: 'None' INFO header not found #83

Open BioInfoNoob opened 1 year ago

BioInfoNoob commented 1 year ago

Thank you for the effort you put in to this work

I've installed xpclr manually and pip installed scipy,numpy,pandas,scikit-allele on an evironment When I ran my sample vcfs with

import os,sys os.system(./xpclr --input ./my.vcf" + " --format \"vcf\" --samplesA ./mysamplesA.txt --samplesB ./mysamplesB.txt --out ./1.xpclr.50k_window_10kb_step.txt --chr 1 --maxsnps 1000 --minsnps 10 --size 50000 --step 10000") I gave it the correct pathway for each input and output files.

First, it gives me these warnings

UserWarning: 'None' INFO header not found UserWarning: no type for field 'variants/None', assuming object

and it stops later on with this error

line 191, in _mean ret = ret / rcount TypeError: ufunc 'true_divide' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

Since my vcf file might have some problem, I tried it with test files within xpclr/fixture directory with

import os,sys os.system(./xpclr --input ./small.vcf.gz" + " --format \"vcf\" --samplesA ./samplesA.txt --samplesB ./samplesB.txt --out ./3L.xpclr.50k_window_10kb_step.txt --chr 3L --maxsnps 1000 --minsnps 10 --size 50000 --step 10000")

it gives me same warnings twice

UserWarning: 'None' INFO header not found UserWarning: no type for field 'variants/None', assuming object UserWarning: 'None' INFO header not found UserWarning: no type for field 'variants/None', assuming object

and stops with the same eroor

line 191, in _mean ret = ret / rcount TypeError: ufunc 'true_divide' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'' (XPCLR) sjlee@sejong:/disk10/2.sjlee_VCF/05.SELECTIONSIGNATURE/XP-CLR$

I've tested the vcf files with vcf-validator and there was nothing wrong.

I installed everything on a new conda environment, so I'm not sure what the problem is could you please help me?

BioInfoNoob commented 1 year ago

I found out that TypeError have some kind of relation with window size cause when I decreased my size option to 50 for fixture samples, it worked but it still gives me

these warnings UserWarning: 'None' INFO header not found UserWarning: no type for field 'variants/None', assuming object UserWarning: 'None' INFO header not found UserWarning: no type for field 'variants/None', assuming object

and these new warnings

util.py:161: RuntimeWarning: Mean of empty slice out["xpclr_norm"] = (out.xpclr - np.nanmean(out.xpclr))/np.nanstd(out.xpclr)

/numpy/lib/nanfunctions.py:1879: RuntimeWarning: Degrees of freedom <= 0 for slice. var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,

and when I checked the output file,

it doesn't have values for sel_coef, nSNPs, nSNPs_avail, xpclr, xpclr_norm

it seems to be connected cause 'Mean of empty slice' means there aren't any non Nan values I'm not sure why this happens.

hardingnj commented 1 year ago

Sorry for slow reply. Did you get any output when you ran on files in the fixture directory?

It seems the issue is that the data is not being read in properly... I'll try to update the dependencies and rerun the fixtures.

BioInfoNoob commented 1 year ago

I got an ouptut for fixture after decreasing the window size, but it doesn't have values for sel_coef, nSNPs, nSNPs_avail, xpclr, xpclr_norm

my package versions were scikit-allel 1.3.5 pandas 1.4.3 numpy 1.22.3 scipy 1.8.1 python 3.9.12

and also a weird thing is that when I run it with my vcf file, it only gives me

UserWarning: 'None' INFO header not found UserWarning: no type for field 'variants/None', assuming object

these warnings once, but when I run it with fixture files, it gives me the warning twice I'm really confused cause I ran XPCLR long time ago with the same vcf file. I remember using conda back then but as I know, conda doesn't work anymore due to dependency problems.

azwanjaafar commented 1 year ago

Hi, I experienced the same issue with no value being detected in the output file columns for sel coef, nSNPs, nSNPs avail, xpclr, and xpclr norm. Is there any solution for this?

shenlinyong commented 1 year ago

I had the same problem? Is there any solution for this?

xpclr -I /storage/SLY68/2022/WGS/G19/gatk/merge/all.missing_maf.vcf -O 111 -Sa /storage/SLY68/2022/WGS/G19/selection_elimination/FL.sample -Sb  /storage/SLY68/2022/WGS/G19/selection_elimination/LL.sample --rrate  1e-8 --ld 0.95 --size 50000 --step 10000 --chr chrZ
2022-11-11 02:50:44 : INFO : running xpclr v1.1.2
2022-11-11 02:50:44 : INFO : Loading VCF
/home/SLY68/anaconda3/lib/python3.8/site-packages/allel/io/vcf_read.py:1240: UserWarning: 'None' INFO header not found
  warnings.warn('%r INFO header not found' % name)
/home/SLY68/anaconda3/lib/python3.8/site-packages/allel/io/vcf_read.py:1454: UserWarning: no type for field 'variants/None', assuming object
  warnings.warn('no type for field %r, assuming %s' % (f, normed_types[f]))
2022-11-11 02:51:43 : INFO : VCF loading complete
2022-11-11 02:51:44 : INFO : 142,054 SNPs in total are in the provided input files
2022-11-11 02:51:44 : INFO : 0 SNPs excluded as multiallelic
2022-11-11 02:51:44 : INFO : 0 SNPs excluded as missing in all samples in a population
2022-11-11 02:51:44 : INFO : 23,215 SNPs excluded as invariant or singleton in population 2
2022-11-11 02:51:44 : INFO : 118,839/142,054 SNPs included in the analysis (83.66%)
2022-11-11 02:51:44 : INFO : Done dropping above SNPs from analysis. XP-CLR algorithm starting.
2022-11-11 02:51:44 : INFO : Omega estimated as : 1.647535
Traceback (most recent call last):
  File "/home/SLY68/anaconda3/bin/xpclr", line 4, in <module>
    __import__('pkg_resources').run_script('xpclr==1.1.2', 'xpclr')
  File "/home/SLY68/anaconda3/lib/python3.8/site-packages/pkg_resources/__init__.py", line 656, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/home/SLY68/anaconda3/lib/python3.8/site-packages/pkg_resources/__init__.py", line 1460, in run_script
    exec(script_code, namespace, namespace)
  File "/home/SLY68/anaconda3/lib/python3.8/site-packages/xpclr-1.1.2-py3.8.egg/EGG-INFO/scripts/xpclr", line 196, in <module>
  File "/home/SLY68/anaconda3/lib/python3.8/site-packages/xpclr-1.1.2-py3.8.egg/EGG-INFO/scripts/xpclr", line 182, in main
  File "/home/SLY68/anaconda3/lib/python3.8/site-packages/xpclr-1.1.2-py3.8.egg/xpclr/methods.py", line 324, in xpclr_scan
  File "/home/SLY68/anaconda3/lib/python3.8/site-packages/numpy/core/_methods.py", line 191, in _mean
    ret = ret / rcount
TypeError: ufunc 'true_divide' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
hardingnj commented 1 year ago

I think this was fixed in https://github.com/hardingnj/xpclr/pull/80

But there hasn't been a release issued since then. Try reinstalling from main and see if the error is resolved.

gaochenx commented 1 year ago

I had the same problem when I add --minsnp . It can be run when min SNPs in a window is larger than nSNPs_avail. This seems paradoxical

gaochenx commented 1 year ago

I tried to download the version of (https://github.com/xuzhougeng/xpclr) . The above questions have been addressed.​