ksamuk / pixy

Software for painlessly estimating average nucleotide diversity within and between populations
https://pixy.readthedocs.io/
MIT License
115 stars 14 forks source link

Hard or soft filtering input VCF? #58

Closed mwersebe closed 2 years ago

mwersebe commented 2 years ago

Hi! I am trying to figure out if sites that do not pass quality filtering (e.g., for depth QUAL, etc.) should be hard or soft filtered from the input VCF file. For example in bcftools filter, if you supply the -s flag it will annotate the filter column of the VCF with what ever string you provide. Does pixy ignore sites not having PASS in this column or count them as missing data?

Many thanks!

ksamuk commented 2 years ago

Hi there! Pixy expects hard filtered VCFs, and doesn't use the filter column at all. Just be careful o make sure you preserve both variant and invariant sites when filtering (see the manual for more info on how to do this).

mwersebe commented 2 years ago

Awesome, Thank you!