heathsc / bs_call

DNA methylation and variant Caller for Bisulfite Sequencing Data.
GNU General Public License v3.0
5 stars 3 forks source link

The genotype field of bcf file is missing alternate allele #3

Open tahuh opened 3 years ago

tahuh commented 3 years ago

Hi @heathsc

I'm using BSCall software to detect some variants from whole genome bisulfite sequencing data.

My version of bs_call is 2.1.7

When I tried to compare the result of BSCall and NA12878's germline SNP variants using gatk's GenotypeCondordance command, I encountered with the error message that

htsjdk.tribble.TribbleException$InternalCodecException: The allele with index 3 is not defined in the REF/ALT columns in the record

After some google search, I found that this is because the alternate allele index is missing (Insufficient amount of alternate allele). There are many records genotyped 1/3 as shown below but only have 2 alternate allele. According to this thread of biostars , I think this genotype would be 1/2 instead of 1/3.

1   28562540    .   C   A,T 114 PASS    CX=CTCCG    GT:FT:DP:MQ:GQ:QD:GL:MC8:AMQ:CS:CG:CX:FS    1/3:PASS:20:42:114:5:-45.0074,-11.452,-24.4232,-19.2041,-32.7773:9,4,0,7,0,0,0,0:37,37,37:NA:.:YTWYR:0

Is this genotype is intention or a mis calling?

paulstretenowich commented 2 years ago

Hi Simon,

I also have the same error message when loading the vcf in IGV:

The allele with index 3 is not defined in the REF/ALT columns in the record, for input source:

FYI I used the version corresponding to the master branch so with the latest commit ef7eb81fbf556719c5e89e979e0b740234e31a74. I can't find any genotype being 1/2 and all 1/3 are like the one in the example of @tahuh: with only 2 alternate alleles. Do you plan to have a fix for this @heathsc or is this intended?

Thanks in advance.

heathsc commented 2 years ago

This is a clear bug - we should not be producing invalid vcf files in any circumstances. I will look into this.

Simon

On Fri, 10 Jun 2022, 21:39 Paul STRETENOWICH, @.***> wrote:

Hi Simon,

I also have the same error message when loading the vcf in IGV:

The allele with index 3 is not defined in the REF/ALT columns in the record, for input source:

FYI I used the version corresponding to the master branch so with the latest commit ef7eb81fbf556719c5e89e979e0b740234e31a74. I can't find any genotype being 1/2 and all 1/3 are like the one in the example of @tahuh https://github.com/tahuh: with only 2 alternate alleles. Do you plan to have a fix for this @heathsc https://github.com/heathsc or is this intended?

Thanks in advance.

— Reply to this email directly, view it on GitHub https://github.com/heathsc/bs_call/issues/3#issuecomment-1152674843, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAY4655N2AGVCHVEDJSUDZLVOOKXLANCNFSM43FTYJDA . You are receiving this because you were mentioned.Message ID: @.***>

paulstretenowich commented 2 years ago

Many thanks for your super quick reply! Keep me posted when you have a fix so that I can give a try on my side and download the new version.

Paul