mkirsche / Jasmine

Jasmine: SV Merging Across Samples
MIT License
174 stars 16 forks source link

What means "GT" in the FILTER field #38

Closed zhqduan closed 2 years ago

zhqduan commented 2 years ago

Hi, I am using Jasmine to merge SVs according to the suggested pipeline, and it works well. But in the final dataset after "Remove low-confidence or imprecise calls" with the command cat <mergedvcf> | grep -v 'IMPRECISE;' | grep -v 'IS_SPECIFIC=0', a large number of variants flagged with "GT" in the FILTER field. The header defines "GT" as "Genotype filter", could you please to explain the detail description of "Genotype filter". And do these variants should be included for downstream analysis? Thanks very much!

Zhongqu

mkirsche commented 2 years ago

Hi Zhongqu,

Thanks for your interest in using our pipeline! I've never seen that before, and it looks like this GT FILTER field value was an addition in the latest version of Sniffles released a few weeks ago: https://github.com/fritzsedlazeck/Sniffles/blob/c02731ad288b32949c583cca2252164b235e5cb6/src/sniffles/postprocessing.py#L304

The code I linked to indicates that the filter is applied based on a threshold for genotype likelihood z-scores, so I would be inclined to keep the variants for downstream analysis but not necessarily trust the genotype values to be as accurate as other calls. However, I could not find any documentation in the Sniffles paper or Github page describing how to interpret the flag so I would recommend reaching out to the Sniffles developers for more information.

I hope that helps! Melanie

zhqduan commented 2 years ago

Hi Melanie,

Thanks for your prompt responses. I find a detail explanation in the Sniffles issues, which resolves my issues. Thank you very much!

Zhongqu