samtools / htsjdk

A Java API for high-throughput sequencing data (HTS) formats.
http://samtools.github.io/htsjdk/
283 stars 242 forks source link

Unclear what the exception is referring to when htsjdk is used to validate a NIST .vcf file #1565

Closed namra1 closed 3 years ago

namra1 commented 3 years ago

Before you submit

Description of the issue:

htsjdk errors when trying to validate a NIST .vcf file. The .vcf file is generated by NIST so I would think it has a valid format. The error message is unclear to me.

Your environment:

Steps to reproduce

If you're reporting a bug, tell us how to reproduce this issue. If possible, include a short code snippet or attach test data to demonstrate the problem.

gatk ValidateVariants -R hg19.fa -V NIST_NA12878_calls_in_PLDv2.vcf

I also get an error attempting to index the .vcf file in igv-tools in version 2.10.3. NIST_NA12878_calls_in_PLDv2.vcf.gz NIST_NA12878_calls_in_PLDv2.vcf.gz

Expected behaviour

Tell us what should happen. It should validate the vcf file

Actual behaviour

gatk ValidateVariants -R hg19.fa -V NIST_NA12878_calls_in_PLDv2.vcf Using GATK jar /gatk/gatk-package-4.2.2.0-local.jar Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /gatk/gatk-package-4.2.2.0-local.jar ValidateVariants -R hg19.fa -V NIST_NA12878_calls_in_PLDv2.vcf 22:27:12.103 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.2.2.0-local.jar!/com/intel/gkl/native/libgkl_compression.so Aug 28, 2021 10:27:12 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine INFO: Failed to detect whether we are running on Google Compute Engine. 22:27:12.247 INFO ValidateVariants - ------------------------------------------------------------ 22:27:12.247 INFO ValidateVariants - The Genome Analysis Toolkit (GATK) v4.2.2.0 22:27:12.247 INFO ValidateVariants - For support and documentation go to https://software.broadinstitute.org/gatk/ 22:27:12.248 INFO ValidateVariants - Executing as root@b5d24391e2e6 on Linux v5.8.0-59-generic amd64 22:27:12.248 INFO ValidateVariants - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_242-8u242-b08-0ubuntu3~18.04-b08 22:27:12.248 INFO ValidateVariants - Start Date/Time: August 28, 2021 10:27:12 PM GMT 22:27:12.248 INFO ValidateVariants - ------------------------------------------------------------ 22:27:12.248 INFO ValidateVariants - ------------------------------------------------------------ 22:27:12.249 INFO ValidateVariants - HTSJDK Version: 2.24.1 22:27:12.249 INFO ValidateVariants - Picard Version: 2.25.4 22:27:12.249 INFO ValidateVariants - Built for Spark Version: 2.4.5 22:27:12.249 INFO ValidateVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2 22:27:12.249 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false 22:27:12.249 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true 22:27:12.249 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false 22:27:12.249 INFO ValidateVariants - Deflater: IntelDeflater 22:27:12.249 INFO ValidateVariants - Inflater: IntelInflater 22:27:12.250 INFO ValidateVariants - GCS max retries/reopens: 20 22:27:12.250 INFO ValidateVariants - Requester pays: disabled 22:27:12.250 INFO ValidateVariants - Initializing engine 22:27:12.659 INFO FeatureManager - Using codec VCFCodec to read file file:///test/NIST_NA12878_calls_in_PLDv2.vcf 22:27:12.666 INFO ValidateVariants - Shutting down engine [August 28, 2021 10:27:12 PM GMT] org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants done. Elapsed time: 0.01 minutes. Runtime.totalMemory()=1182793728 org.broadinstitute.hellbender.exceptions.GATKException: Error initializing feature reader for path NIST_NA12878_calls_in_PLDv2.vcf at org.broadinstitute.hellbender.engine.FeatureDataSource.getTribbleFeatureReader(FeatureDataSource.java:436) at org.broadinstitute.hellbender.engine.FeatureDataSource.getFeatureReader(FeatureDataSource.java:377) at org.broadinstitute.hellbender.engine.FeatureDataSource.(FeatureDataSource.java:319) at org.broadinstitute.hellbender.engine.FeatureDataSource.(FeatureDataSource.java:291) at org.broadinstitute.hellbender.engine.VariantWalker.initializeDrivingVariants(VariantWalker.java:58) at org.broadinstitute.hellbender.engine.VariantWalkerBase.initializeFeatures(VariantWalkerBase.java:67) at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:726) at org.broadinstitute.hellbender.engine.VariantWalker.onStartup(VariantWalker.java:45) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203) at org.broadinstitute.hellbender.Main.main(Main.java:289) Caused by: htsjdk.tribble.TribbleException$MalformedFeatureFile: Unable to parse header with error: Flag is an unsupported type for this kind of field, for input source: NIST_NA12878_calls_in_PLDv2.vcf at htsjdk.tribble.TribbleIndexedFeatureReader.readHeader(TribbleIndexedFeatureReader.java:263) at htsjdk.tribble.TribbleIndexedFeatureReader.(TribbleIndexedFeatureReader.java:102) at htsjdk.tribble.TribbleIndexedFeatureReader.(TribbleIndexedFeatureReader.java:127) at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:121) at org.broadinstitute.hellbender.engine.FeatureDataSource.getTribbleFeatureReader(FeatureDataSource.java:433) ... 13 more Caused by: java.lang.IllegalArgumentException: Flag is an unsupported type for this kind of field at htsjdk.variant.vcf.VCFCompoundHeaderLine.(VCFCompoundHeaderLine.java:243) at htsjdk.variant.vcf.VCFFormatHeaderLine.(VCFFormatHeaderLine.java:50) at htsjdk.variant.vcf.AbstractVCFCodec.parseHeaderFromLines(AbstractVCFCodec.java:198) at htsjdk.variant.vcf.VCFCodec.readActualHeader(VCFCodec.java:111) at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:79) at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:37) at htsjdk.tribble.TribbleIndexedFeatureReader.readHeader(TribbleIndexedFeatureReader.java:26

cmnbroad commented 3 years ago

@namra1 You're right that the error message isn't as helpful as it could be, but the file you provided does indeed appear to have an invalid header line:

##FORMAT=<ID=DS,Number=0,Type=Flag,Description="Were any of the samples downsampled?">

Not sure how that got generated, but the VCF spec does prohibit Type=Flag for a ##FORMAT line. Flag is only applicable to ##INFO fields.

namra1 commented 3 years ago

I removed the line and it resolved the issue. Thanks.

Sent from my iPhone

On Aug 30, 2021, at 3:40 PM, Chris Norman @.***> wrote:

 @namra1 You're right that the error message isn't as helpful as it could be, but the file you provided does indeed appear to have an invalid header line:

FORMAT=

Not sure how that got generated, but the VCF spec does prohibit Type=Flag for a ##FORMAT line. Flag is only applicable to ##INFO fields.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

cmnbroad commented 3 years ago

On second thought, reopening since we should fix the unhelpful error message to include some context.