sigven / vcf2tsvpy

Genomic VCF to tab-separated values
MIT License
46 stars 13 forks source link

default options excluding entries with PASS filter? #9

Closed BCArg closed 1 year ago

BCArg commented 1 year ago

I have run vct2tsvpy with default arguments i.e. only required arguments with the following command:

vcf2tsvpy --input_vcf {in_vcf} --out_tsv {out_tsv} --skip_info_data

I have noticed, however, that some entries from the vcf, which have a PASS value under FILTER column were excluded from the output tsv file.

For example, the entry below is present on the vcf:

1       776546  rs12124819      A       G       .       PASS    .       GT:GQ:BAF:LRR   ./.:0:0.590754:0.0825162

but it is not present in the output tsv file, unless I pass the --keep_rejected_calls, in which case, the tsv file is complete.

Below is a vimdiff screenshot, the left-hand side with --keep_rejected_calls, right-hand side only with required arguments.

image

Is this the expected behaviour? How come not passing --keep_rejected_calls excludes calls that have a PASS under FILTER?

Thanks in advance

BCArg commented 1 year ago

having a closer look at the entries that were removed, I see that they have GQ (genotype quality 0), so I guess this is the reason.

sigven commented 1 year ago

I just noticed the same now. Wonder how these could be annotated as PASS?

sigven commented 1 year ago

They also have an undefined genotype (GT).

BCArg commented 1 year ago

That's correct, GT is always './.', from the entries that I checked. Indeed it is a bit dodgy that they are annotated as PASS, but other quality parameters are poor. Anyway, I reckon your tool is performing as expected, so I will close this issue, thanks for the assistance

sigven commented 1 year ago

Hey @BCArg, thanks for looking into this. One might of course considering implementing some warnings when encountering data like you have, but this is really unexpected input, I believe. Sadly, my experience is that VCFs from different callers rarely adhere strictly to the VCF specification, which makes it inherently difficult to cope with all scenarios. If you have not done so already, you might want to check out the vcf-validator, to get some feel for how "valid" your VCF file is.