I was trying to use the fst functions to compute Fst between individuals, in this case putative siblings, and was receiving values back which I knew were incorrect (Fst between regions in IBD should be -.25 or -1). After I looked in more detail, I see that when computing fst between two cohorts, Fst is clipped between 0 and 1.
I was trying to use the fst functions to compute Fst between individuals, in this case putative siblings, and was receiving values back which I knew were incorrect (Fst between regions in IBD should be -.25 or -1). After I looked in more detail, I see that when computing fst between two cohorts, Fst is clipped between 0 and 1.
https://github.com/malariagen/malariagen-data-python/blob/dc89c9ceaa6e7bff3d0842b626870140a8ff6809/malariagen_data/anoph/fst.py#L83
I suggest we either:
A) do not clip Fst and return the true Fst values B) add a parameter such as
clip_min
which changes the floor that we clip to (so I can set it to -1).I actually have a slight preference for A, but I'm happy to implement either.