ksamuk / pixy

Software for painlessly estimating average nucleotide diversity within and between populations
https://pixy.readthedocs.io/
MIT License
115 stars 14 forks source link

Are these filter out sites not used in the calculation? #101

Open jiangzy26 opened 5 months ago

jiangzy26 commented 5 months ago

Hi, I know we need a hard filter to variant and invariant data, respectively, which means we may lose sites within each sliding window, for example, for 50k window size, after filtering out the low QUAL sites, we may only have 40k sites, I am not sure whether we need to add invariants to these 'lost' positions, could you share your help? Thanks.

hrluo93 commented 4 months ago

Hi,

I am also want to know how to covert low quality variant site to invariant site. I see a pipeline seems can keep invariant sites but filtered low quality variant sites, and combine invariant site+high quality variant site. I don't know if it suitable.

you can check the website: https://yanzhongsino.github.io/2023/03/13/bioinfo_population.genetics_pixy/

Best wishes!