molgenis / systemsgenetics

Generic Java genotype reader / writer, QTL mapping software, Strand alignment tool
https://github.com/molgenis/systemsgenetics/wiki
GNU General Public License v3.0
172 stars 100 forks source link

Strand alignment error #587

Open ConnieXuhm opened 3 years ago

ConnieXuhm commented 3 years ago

HI, when I try to do the strand flip process by using the pre-phased shapeit2 data, with the reference panel as 1000 genome phase3 (.vcf.gz format), I encounter the following problem. Please help me, thanks!

My code is java -jar GenotypeHarmonizer.jar --input 1118_noflip.phased1 --inputType SHAPEIT2 --ref ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes --refType VCF --output 1202_GH --update-id --outputType SHAPEIT2

And the error presents:

Started logging

Interpreted arguments: 
 - Input base path: 1118_noflip.phased1 
 - Input data type: Shapeit2 output
 - Output base path: 1202_GH
 - Output data type: Impute2 haplotypes haps / sample files
 - Reference base path: ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes
 - Reference data type: VCF file
 - Number of flank variants to consider for LD alignment: 100
 - Minimum LD of flanking variants before using for LD alignment: 0.3
 - Minimum number of variants needed to for LD alignment: 3
 - Maximum MAF of variants to use minor allele as backup for alignment: 0.0
 - Update study IDs: yes
 - Match study reference alleles: no
 - Keep variants not in reference data: no
 - Minimum posterior probability for input data: 0.4
 - LD checker off
 - Force input sequence name: not forcing

Input data loaded
Exception in thread "main" java.lang.RuntimeException: BGZF file has invalid uncompressedLength: -408592984
    at net.sf.samtools.util.BlockCompressedInputStream.inflateBlock(BlockCompressedInputStream.java:380)
    at net.sf.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:365)
    at net.sf.samtools.util.BlockCompressedInputStream.available(BlockCompressedInputStream.java:113)
    at net.sf.samtools.util.BlockCompressedInputStream.readLine(BlockCompressedInputStream.java:181)
    at org.molgenis.vcf.meta.VcfMetaParser.readLine(VcfMetaParser.java:126)
    at org.molgenis.vcf.meta.VcfMetaParser.parse(VcfMetaParser.java:41)
    at org.molgenis.vcf.VcfReader.parseVcfMeta(VcfReader.java:70)
    at org.molgenis.vcf.VcfReader.getVcfMeta(VcfReader.java:57)
    at org.molgenis.genotype.vcf.VcfGenotypeData.<init>(VcfGenotypeData.java:124)
    at org.molgenis.genotype.vcf.VcfGenotypeData.<init>(VcfGenotypeData.java:80)
    at org.molgenis.genotype.RandomAccessGenotypeDataReaderFormats.createGenotypeData(RandomAccessGenotypeDataReaderFormats.java:184)
    at org.molgenis.genotype.RandomAccessGenotypeDataReaderFormats.createGenotypeData(RandomAccessGenotypeDataReaderFormats.java:158)
    at org.molgenis.genotype.RandomAccessGenotypeDataReaderFormats.createGenotypeData(RandomAccessGenotypeDataReaderFormats.java:133)
    at nl.umcg.deelenp.genotypeharmonizer.GenotypeHarmonizer.main(GenotypeHarmonizer.java:325)
Caused by: java.lang.NegativeArraySizeException: -408592984
    at net.sf.samtools.util.BlockCompressedInputStream.inflateBlock(BlockCompressedInputStream.java:378)
    ... 13 more
PatrickDeelen commented 3 years ago

it seems that either the vcf.gz file or the tbi file are corrupt. You could try running the vcf-validator (https://vcftools.github.io/perl_module.html#vcf-validator) to verify this.