malariagen / malariagen-data-python

Analyse MalariaGEN data from Python
https://malariagen.github.io/malariagen-data-python/latest/
MIT License
13 stars 23 forks source link

Should Fst be clipped between 0 and 1? #543

Open sanjaynagi opened 1 month ago

sanjaynagi commented 1 month ago

I was trying to use the fst functions to compute Fst between individuals, in this case putative siblings, and was receiving values back which I knew were incorrect (Fst between regions in IBD should be -.25 or -1). After I looked in more detail, I see that when computing fst between two cohorts, Fst is clipped between 0 and 1.

https://github.com/malariagen/malariagen-data-python/blob/dc89c9ceaa6e7bff3d0842b626870140a8ff6809/malariagen_data/anoph/fst.py#L83

I suggest we either:

A) do not clip Fst and return the true Fst values B) add a parameter such as clip_min which changes the floor that we clip to (so I can set it to -1).

I actually have a slight preference for A, but I'm happy to implement either.