odelaneau / shapeit4

Segmented HAPlotype Estimation and Imputation Tool
MIT License
89 stars 17 forks source link

ERROR: No variants to be phased in [/shapeit/out.recode.vcf.gz] #33

Open bopohdr opened 3 years ago

bopohdr commented 3 years ago

Hi !

I try to run the SHAPEIT4 with a multisample .vcf from WES via docker:

docker run -v /Users/shapeit:/shapeit lifebitai/shapeit4 shapeit4 --input /shapeit/out.recode.vcf.gz --map /shapeit/genetic_maps.b38.tar.gz --region 2 --output /shapeit/phased_100_samples_glnexus.vcf.gz --sequencing --thread 12

But get an error regarding the VCF:

` SHAPEIT

Files:

Parameters:

Initialization:

The multisample vcf was made with GLnexus and only variants genotyped >0.8 samples were left.

A glimpse of how the vcf looks is attached. out.recode.txt

Does the VCF requires some changes ?

Thanks !

tvkent commented 3 years ago

I'm running into the same error, whether its with a bcf or vcf, whatshap phased or not. @odelaneau, have you had a look at this? I've tried to run shapeit 4.13 with multiple vcfs and always run into the same error. Could it be an htslib version problem?

odelaneau commented 3 years ago

Hi,

Use --region chr2 instead. This is how chromosome IDs are encoded in your data.

Cheers,

tvkent commented 3 years ago

Hi @odelaneau,

I've tried this as well with my own data, with the thought that underscores in scaffold names may not be recognized, but even if I change the scaffold names to 1 and specify this in the arguments, the error remains.

Just running a very basic command line here shows the file never makes it past the check for variants:

shapeit4 --input test.vcf.gz --region 1 --sequencing --output testout.vcf.gz

Attached index file has added .txt extension for uploading.

test.vcf.gz test.vcf.gz.csi.txt

bopohdr commented 3 years ago

Hi,

Use --region chr2 instead. This is how chromosome IDs are encoded in your data.

Cheers,

Hi !

It did solve my error, but I run into another one:

SHAPEIT
  * Author        : Olivier DELANEAU, University of Lausanne
  * Contact       : olivier.delaneau@gmail.com
  * Version       : 4.1.3
  * Run date      : 19/09/2020 - 07:20:17

Files:
  * Input VCF     : [/shapeit/out.recode.vcf.gz]
  * Genetic Map   : [/shapeit/genetic_maps.b38.tar.gz]
  * Output VCF    : [/shapeit/phased_100_samples_glnexus.vcf.gz]

Parameters:
  * Seed    : 15052011
  * Threads : 12 threads
  * MCMC    : 15 iterations [5b + 1p + 1b + 1p + 1b + 1p + 5m]
  * PBWT    : Depth of PBWT neighbours to condition on: 4
  * PBWT    : Store indexes at variants [MAC>=2 / MDR<=0.5 / Dist=0.0005 cM]
  * HMM     : K is variable / min W is 2.50cM / Ne is 15000
  * HMM     : Recombination rates given by genetic map
  * HMM     : AVX2 optimization active
  * IBD2    : length>=3.00cM [N>=10000 / MAF>=0.000 / MDR<=0.500]

Initialization:
  * VCF/BCF scanning [N=100 / L=2527 / Reg=chr2] (0.19s)
  * VCF/BCF parsing [Hom=86.8% / Het=11.8% / Mis=1.4%] (0.22s)

ERROR: Parsing line 0 : incorrect number of columns, observed: 1 expected: 3

Does it mean it expects 3 lines in the vcf ?

odelaneau commented 3 years ago

Hi,

You need to tar xzvf /shapeit/genetic_maps.b38.tar.gz to get access to the per-chromosome maps.

Best,

odelaneau commented 3 years ago

Hi @odelaneau,

I've tried this as well with my own data, with the thought that underscores in scaffold names may not be recognized, but even if I change the scaffold names to 1 and specify this in the arguments, the error remains.

Just running a very basic command line here shows the file never makes it past the check for variants:

shapeit4 --input test.vcf.gz --region 1 --sequencing --output testout.vcf.gz

Attached index file has added .txt extension for uploading.

test.vcf.gz test.vcf.gz.csi.txt

Hi,

I do not understand. I've just tried your command line on the data you uploaded with the last version, and it works. I do not get the error.

tvkent commented 3 years ago

Hi,

As an update, I got shapeit4 running after recompiling with a local install of boost 1.74 and an older version of htslib (1.9 instead of 1.10).

For future reference, I followed the makefile instructions from issue #7, including adding htslib to the LD_LIBRARY_PATH in my .bashrc (remember that you need to restart your session after sourcing .bashrc for the changes to work).

My previous installation resulted in an error that there were no variants to be phased. As far as I can tell, this was an error with the newer htslib version.

Cheers

odelaneau commented 3 years ago

Thanks for this. I'll look at what is happening with htslib 1.10.