lindenb / jvarkit

Java utilities for Bioinformatics
https://jvarkit.readthedocs.io/
Other
478 stars 132 forks source link

[FixVcfMissingGenotypes]partial missing values for GL field, Ignoring SAM validation error, and [Launcher]fixvcfmissinggenotypes Exited with failure (-1) #153

Closed jdalapicolla closed 4 years ago

jdalapicolla commented 4 years ago

Subject of the issue

I must merge 60 individual vcf files into a new one. I did it using VCFTools and also BCFTools but the homozygotes genotypes were converted to missing data. I ran again VCFTools and BCFTools with their parameters to correct it -R and -0 and ALL missing data were converted to homozygotes. So I found the jvarkit and I'm running fixvcfmissinggenotypes in a laptop as a test with 4 samples before running it on a cluster with 60 samples. My laptop is a ubuntu 18.04 LTS, 4GB RAM. Java and jvarkit is updated, and each .bam file has around 2GB.

Steps to reproduce

For each sample I did the index step: $ samtools index /path/to/file/sample1.bam

I created the "bams.list"

And I ran the fixvcfmissinggenotypes: java -jar /path/to/jar-file/jvarkit/dist/fixvcfmissinggenotypes.jar -B bams.list < merge.vcf > out.vcf

Actual behaviour

The analysis running with some warnings and after it stopped. The command created an output with only 1 line.

OUTPUT is this:

fileformat=VCFv4.2

FILTER=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

bcftools_annotateCommand=annotate --remove ^INFO/TYPE,^INFO/DP,^INFO/RO,^INFO/AO,^INFO/AB,^FORMAT/GT,^FORMAT/DP,^FORMAT/RO,^FORMAT/AO,^FORMAT/QR,^FORMAT/QA,^FORMAT/GL; Date=Mon Mar 2 11:36:58 2020

bcftools_annotateVersion=1.9+htslib-1.9

bcftools_mergeCommand=merge -o /home/b0219/Documentos/T/ols/combine_default.vcf /home/b0219/Documentos/T/ols/ITV20304.vcf.gz /home/b0219/Documentos/T/ols/ITV20305.vcf.gz /home/b0219/Documentos/T/ols/ITV20306.vcf.gz /home/b0219/Documentos/T/ols/ITV20307.vcf.gz; Date=Wed Mar 4 11:24:26 2020

bcftools_mergeVersion=1.10.2+htslib-1.10.2

bcftools_viewCommand=view --include 'FMT/GT="1/1" && QUAL>=100 && FMT/DP>=10 && (FMT/AO)/(FMT/DP)>=0' snps.raw.vcf; Date=Mon Mar 2 11:36:58 2020

bcftools_viewVersion=1.9+htslib-1.9

commandline="freebayes -p 2 -P 0 -C 10 --min-repeat-entropy 1.5 --strict-vcf -q 13 -m 60 --min-coverage 10 -F 0.05 -f reference/ref.fa snps.bam --region NODE_755847_length_295_cov_0.702381:0-295"

contig=

contig=

contig=

contig=

contig=

contig=

contig=

[... all other contigs information...]

contig=

fileDate=20200302

phasing=none

reference=reference/ref.fa

source=freeBayes v1.3.2-dirty

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT ITV20304 ITV20305 ITV20306 ITV20307

NODE_755847_length_295_cov_0.702381 127 . T A 368 . AB=0;AC=2;AF=0.500;AN=4;AO=13;DP=41;QA=438;QR=0;RO=0;TYPE=snp GT:AO:DP:FXG:PL:QA:QR:RO ./.:.:7 1/1:13:13:.:397,39,0:438:0:0 0/0:.:14:1 ./.:.:7

These are the messages in the terminal: [INFO][FixVcfMissingGenotypes]Reading header for ./ITV20307.bam [INFO][FixVcfMissingGenotypes]Reading header for ./ITV20306.bam [INFO][FixVcfMissingGenotypes]Reading header for ./ITV20304.bam [INFO][FixVcfMissingGenotypes]Reading header for ./ITV20305.bam Ignoring SAM validation error: ERROR::INVALID_FLAG_FIRST_OF_PAIR:Record 2, Read name NB501693:85:HVWWGBGX5:2:13102:14872:4167, First of pair flag should not be set for unpaired read. Ignoring SAM validation error: ERROR::INVALID_FLAG_FIRST_OF_PAIR:Record 4, Read name NB501693:85:HVWWGBGX5:1:13306:5883:10022, First of pair flag should not be set for unpaired read. Ignoring SAM validation error: ERROR::INVALID_FLAG_SECOND_OF_PAIR:Record 5, Read name NB501693:85:HVWWGBGX5:3:13506:11818:11310, Second of pair flag should not be set for unpaired read. Ignoring SAM validation error: ERROR::INVALID_FLAG_SECOND_OF_PAIR:Record 7, Read name NB501693:85:HVWWGBGX5:2:21303:14855:19937, Second of pair flag should not be set for unpaired read. Ignoring SAM validation error: ERROR::INVALID_FLAG_FIRST_OF_PAIR:Record 1, Read name NB501693:85:HVWWGBGX5:4:23512:24339:8630, First of pair flag should not be set for unpaired read. Ignoring SAM validation error: ERROR::INVALID_FLAG_FIRST_OF_PAIR:Record 2, Read name NB501693:85:HVWWGBGX5:3:21401:4471:8645, First of pair flag should not be set for unpaired read. Ignoring SAM validation error: ERROR::INVALID_FLAG_FIRST_OF_PAIR:Record 5, Read name NB501693:85:HVWWGBGX5:1:11201:18451:8467, First of pair flag should not be set for unpaired read. Ignoring SAM validation error: ERROR::INVALID_FLAG_FIRST_OF_PAIR:Record 6, Read name NB501693:85:HVWWGBGX5:2:21103:11748:18799, First of pair flag should not be set for unpaired read. Ignoring SAM validation error: ERROR::INVALID_FLAG_SECOND_OF_PAIR:Record 7, Read name NB501693:85:HVWWGBGX5:4:23512:21627:6251, Second of pair flag should not be set for unpaired read. Ignoring SAM validation error: ERROR::INVALID_FLAG_SECOND_OF_PAIR:Record 10, Read name NB501693:85:HVWWGBGX5:2:22204:20614:10932, Second of pair flag should not be set for unpaired read. Ignoring SAM validation error: ERROR::INVALID_FLAG_SECOND_OF_PAIR:Record 11, Read name NB501693:85:HVWWGBGX5:3:11403:17625:6925, Second of pair flag should not be set for unpaired read. Ignoring SAM validation error: ERROR::INVALID_FLAG_FIRST_OF_PAIR:Record 15, Read name NB501693:85:HVWWGBGX5:1:22303:24322:4416, First of pair flag should not be set for unpaired read. Ignoring SAM validation error: ERROR::INVALID_FLAG_FIRST_OF_PAIR:Record 17, Read name NB501693:85:HVWWGBGX5:2:22311:23682:13592, First of pair flag should not be set for unpaired read. Ignoring SAM validation error: ERROR::INVALID_FLAG_FIRST_OF_PAIR:Record 4, Read name NB501693:85:HVWWGBGX5:2:13205:17380:11482, First of pair flag should not be set for unpaired read. Ignoring SAM validation error: ERROR::INVALID_FLAG_SECOND_OF_PAIR:Record 7, Read name NB501693:85:HVWWGBGX5:3:11503:23913:17645, Second of pair flag should not be set for unpaired read. Ignoring SAM validation error: ERROR::INVALID_FLAG_SECOND_OF_PAIR:Record 8, Read name NB501693:85:HVWWGBGX5:3:11603:7652:4454, Second of pair flag should not be set for unpaired read. [SEVERE][FixVcfMissingGenotypes]partial missing values for GL field htsjdk.tribble.TribbleException: partial missing values for GL field at htsjdk.variant.variantcontext.GenotypeLikelihoods.parseDeprecatedGLString(GenotypeLikelihoods.java:269) at htsjdk.variant.variantcontext.GenotypeLikelihoods.fromGLField(GenotypeLikelihoods.java:78) at htsjdk.variant.vcf.AbstractVCFCodec.createGenotypeMap(AbstractVCFCodec.java:817) at htsjdk.variant.vcf.AbstractVCFCodec$LazyVCFGenotypesParser.parse(AbstractVCFCodec.java:121) at htsjdk.variant.variantcontext.LazyGenotypesContext.decode(LazyGenotypesContext.java:158) at htsjdk.variant.variantcontext.LazyGenotypesContext.getGenotypes(LazyGenotypesContext.java:148) at htsjdk.variant.variantcontext.GenotypesContext.get(GenotypesContext.java:417) at htsjdk.variant.variantcontext.VariantContext.getGenotype(VariantContext.java:1102) at com.github.lindenb.jvarkit.tools.misc.FixVcfMissingGenotypes.doVcfToVcf(FixVcfMissingGenotypes.java:233) at com.github.lindenb.jvarkit.util.jcommander.Launcher.doVcfToVcf(Launcher.java:567) at com.github.lindenb.jvarkit.util.jcommander.Launcher.doVcfToVcf(Launcher.java:614) at com.github.lindenb.jvarkit.tools.misc.FixVcfMissingGenotypes.doWork(FixVcfMissingGenotypes.java:346) at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMain(Launcher.java:777) at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMainWithExit(Launcher.java:940) at com.github.lindenb.jvarkit.tools.misc.FixVcfMissingGenotypes.main(FixVcfMissingGenotypes.java:358) [INFO][Launcher]fixvcfmissinggenotypes Exited with failure (-1)

I didn't find in the internet about the "Ignoring SAM validation error:" or "[SEVERE][FixVcfMissingGenotypes]partial missing values for GL field" or "[INFO][Launcher]fixvcfmissinggenotypes Exited with failure (-1)"

I'm trying merge this vcf files in a corrected way for 3 days. I have no experience in Java, could someone help me with these errors? Or have another suggestion?

Thank you so much for your time, Jeronymo

lindenb commented 4 years ago

Hi, For the sam validation error I pushed a new version with the option '--stringency' that you can set to SILENTto hide the warnings.

Regarding the GL field. The error comes from your input vcf. I pretty sure if you use your vcf with a tool like gatk or picard (they both use the htsjdk library) you would get the very same error. And I'm afraid there is nothing I can do. https://www.google.com/search?q=parseDeprecatedGLString+error

jdalapicolla commented 4 years ago

Hi, Thank you for your reply!

I'll look for another option do merge the vcf files. Jvarkit has one merge function but it is deprecated, right? http://lindenb.github.io/jvarkit/VCFMerge.html

Thank you again!

lindenb commented 4 years ago

yeah, my vcfmerge is a broken tool. You cshould try gatk3.8 'CombineVariants' . Or, better, call your samples with gatk 4 haplotypecaller in GVCF mode.

jdalapicolla commented 4 years ago

Nice! I'll try this one! Thank you some much!