kishwarshafin / pepper

PEPPER-Margin-DeepVariant
MIT License
244 stars 42 forks source link

Ambiguous IUPAC bases appearing in output VCF violate VCF Spec #167

Open rickymagner opened 2 years ago

rickymagner commented 2 years ago

Hi,

I tried running some tools on an output VCF from PEPPER, and ran into a nonstandard base, which caused a lot of tools to throw an error. The problem is that hg38 has some bases other than A/C/G/T/N in it, which can occasionally get called via a deletion. Here is a sample link from a VCF:

chr3 16902881 . TGB T 1.4 RefCall . GT:GQ:DP:AD:VAF:PL ./.:6:26:24,0:0:0,16,4

Technically the presence of bases other than A/C/G/T/N is a violation of the VCF spec, so it's reasonable the other tools would throw an error upon seeing them. I originally opened this issue in DeepVariant, but they directed me to this repo as the cause for this behavior. It's possible to avoid these few sites where it happens, but I thought I'd make you aware this is technically an issue with the way outputs are generated here.

Thanks!

kishwarshafin commented 2 years ago

@rickymagner ,

So sorry for being late on this issue. Yes, I see that this currently is an issue in pepper. The next few weeks a bit busy, I'll make sure to fix this for the next release.

kishwarshafin commented 1 year ago

Hi @rickymagner ,

As you may have noticed, we are moving toward having DeepVariant call variants directly from ONT reads and PEPPER being legacy support. This also means we are likely not going to have another release. For the purposes of using it, you can do a post-processing that will exclude these regions.

I am going to keep this issue open to let other know that this bug exists but won't fix.

rickymagner commented 1 year ago

Thanks for the update. Do you know if DeepVariant is guaranteed not to do this sort of behavior with their new ONT model?

kishwarshafin commented 1 year ago

@rickymagner, yes, I can confirm that DeepVariant only considers ACGTN as valid bases.