lindenb / jvarkit

Java utilities for Bioinformatics
https://jvarkit.readthedocs.io/
Other
478 stars 132 forks source link

Vcf2Bam not recognizing sequence dictionary #164

Closed Mahmoudy96 closed 7 months ago

Mahmoudy96 commented 4 years ago

Hi, I'm trying to use the tool vcf2bam from jvarkit and I have the following 2 files: GRCh38_latest_genomic.fna , and 00-common_all.vcf. I used samtools faidx and also picard CreateSequenceDictionary to create the index files as instructed in the tools documentation, but when I try to run the following command:

$ java -jar jvarkit/dist/vcf2bam.jar -R GRCh38_latest_genomic.fna 00-common_all.vcf

I get the following errors:

[SEVERE][VcfToBam]Sequence Dictionary missing in VCF

java.io.IOException: Sequence Dictionary missing in VCF

at com.github.lindenb.jvarkit.tools.misc.VcfToBam.run(VcfToBam.java:144)
at com.github.lindenb.jvarkit.tools.misc.VcfToBam.doWork(VcfToBam.java:416)
at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMain(Launcher.java:760)
at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMainWithExit(Launcher.java:923)
at com.github.lindenb.jvarkit.tools.misc.VcfToBam.main(VcfToBam.java:434)

[INFO][Launcher]vcf2bam Exited with failure (-1)

The directory which the files are in looks like this:

GRCh38_latest_genomic.fna
GRCh38_latest_genomic.fna.fai
GRCh38_latest_genomic.dict
00-common_all.vcf

as well as index files built by bowtie-build for the reference genome

I would really appreciate some help in figuring out why the tool isn't recognizing the sequence dictionary.

lindenb commented 4 years ago

I posted the answer yesterday: https://www.biostars.org/p/458494/#458518

Mahmoudy96 commented 4 years ago

I didn't see that, thank you.

I've managed to create an updated VCF, but now the the tool is giving the error

[SEVERE][VcfToBam]vcf doesn't have any genotypes

Is there another script I can run to fix this issue as well, or is it a problem with the file itself?

The vcf in question is an SNP database of the human genome, available here: ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/VCF/

lindenb commented 4 years ago

You can always add a fake genotype

wget -O - "https://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/VCF/00-All.vcf.gz" |\
gunzip -c |\
awk '/^#CHROM/ {printf("##FORMAT=<ID=GT,Number=1,Type=String,Description=\"Genotype\">\n%s\tFORMAT\tSAMPLE\n",$0);next} {printf("%s\tGT\t0/1\n",$0);}'
Mahmoudy96 commented 4 years ago

I'll try that, thanks.