lindenb / jvarkit

Java utilities for Bioinformatics
https://jvarkit.readthedocs.io/
Other
476 stars 131 forks source link

[SEVERE][MsaToVcf]Duplicate allele added to VariantContext #241

Closed liaochen1988 closed 6 months ago

liaochen1988 commented 6 months ago

Subject of the issue

The command I ran was "java -jar ../jvarkit.jar msa2vcf -m bepA_2.txt" and the output error is attached below

[INFO][MsaToVcf]format : Fasta

fileformat=VCFv4.2

FORMAT=

FORMAT=

INFO=

contig=

msa2vcf.meta=compilation:20240102101636 githash:717b89405 htsjdk:4.0.1 date:20240105121644 cmd:-m bepA_2.faa

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT PA_1402497_3_DLJEFMPF03093 PA_1402507_3_JCNOJIEN01183 PA_1402516_3_GEBHJBMB00757 PA_1402533_3_JOLDBAAN01523 PA_1402547_3_HLGEDPDI03879 PA_1402552_3_OCEIHCEE06036 PA_1402570_3_EILDPBFJ00773 PA_1418247_3_NDPHJBAE02968 PA_287_12488_GAHKELMD02255 PA_287_12490_NHLAMOGA00757 PA_287_12505_CEEHBNPH02101 PA_287_12569_HIJPFKFF02318 PA_287_12575_NCJEMALD02311 PA_287_12577_LLPPMPKN01995 PA_287_12642_GAOHHCML02248 PA_287_12687_HDFBEIMK02280 PA_287_12689_MGPELKNO03001 PA_287_12712_KFANAFIM00320 PA_287_12720_JOKEPCNB00319 PA_287_12726_LBJIDJLF00661 PA_287_12781_BHJGJGNJ02503 PA_287_12803_NFOBMFKI01902 PA_287_12805_NJJDCHFO01359 PA_287_12820_MKOJOPOM03370 PA_287_12825_GPNNBMKF05518 PA_287_12845_FLBEOOJM00778 PA_287_12848_MAOHINEJ00705 PA_287_2978_KCCGMNJN01618 PA_287_3109_HNNBAFAI03103 PA_287_3116_LGCKGEML05873 PA_287_3130_JLMFAHAL03074 PA_287_4055_KDAONLDE00694 PA_287_4057_IHJNGGLD05498 PA_287_4064_AIEBOCOP05787 PA_287_5702_JFPLKAHL04090 PA_287_7777_KFGGKOPE01144 PA_287_7797_CLHJFHMB01089 PA_287_7802_NCKIHPHI02807 PA_287_7817_JEJKHOPN03290 PA_287_7849_LLMCKINM01664 PA_287_7854_JBGLCGLE01979 PA_287_7856_EIGEFFIL02518 PA_287_7857_MHABBLKI01150 PA_287_7860_LFJBACDD00088 PA_287_8500_PDJJJGFP01911 PA_287_8510_BJCGEBNF01634 PA_287_8539_MCAGHFPO00782 PA_287_8884_OMHIHELL01470

[SEVERE][MsaToVcf]Duplicate allele added to VariantContext: NNNNNNNNNANNNNNNNNAGNCAGNNNNTANNN java.lang.IllegalArgumentException: Duplicate allele added to VariantContext: NNNNNNNNNANNNNNNNNAGNCAGNNNNTANNN at htsjdk.variant.variantcontext.VariantContext.makeAlleles(VariantContext.java:1582) at htsjdk.variant.variantcontext.VariantContext.(VariantContext.java:472) at htsjdk.variant.variantcontext.VariantContextBuilder.make(VariantContextBuilder.java:647) at htsjdk.variant.variantcontext.VariantContextBuilder.make(VariantContextBuilder.java:638) at com.github.lindenb.jvarkit.tools.msa2vcf.MsaToVcf.doWork(MsaToVcf.java:617) at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMain(Launcher.java:819) at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMainWithExit(Launcher.java:982) at com.github.lindenb.jvarkit.tools.msa2vcf.MsaToVcf.main(MsaToVcf.java:658) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) at java.base/java.lang.reflect.Method.invoke(Method.java:578) at com.github.lindenb.jvarkit.tools.jvarkit.JvarkitCentral$Command.execute(JvarkitCentral.java:281) at com.github.lindenb.jvarkit.tools.jvarkit.JvarkitCentral.run(JvarkitCentral.java:759) at com.github.lindenb.jvarkit.tools.jvarkit.JvarkitCentral.main(JvarkitCentral.java:770) [INFO][Launcher]msa2vcf Exited with failure (-1)

Your environment

Steps to reproduce

The command to reproduce the issue was attached above. The file 'bepA_2.txt" is a multiple sequence alignment of protein sequences. I attached the file too (see below). I have to rename it to ".txt" since ".faa" or ".fa" is not a format supported by GitHub.

bepA_2.txt

Expected behaviour

I expected it to finish without error.

Actual behaviour

It reported an error "Duplicate allele added to VariantContext".

lindenb commented 6 months ago

These are protein sequences but VCF is only designed for DNA references, NOT amino-acids. You should use another tool.