sgkit-dev / sgkit

Scalable genetics toolkit
https://sgkit-dev.github.io/sgkit
Apache License 2.0
235 stars 32 forks source link

Option to calculate IBS probabilities via allele matching #1227

Closed timothymillar closed 4 months ago

timothymillar commented 5 months ago

The identity_by_state method calculates mean probabilities of identity by state (IBS) from call_allele_freqeuncies. The current implementation is fairly efficient when the alleles dimension is small, but not when the alleles dimension is large. An alternative approach is to directly compare allelic states within the call_genotype array. This is likely to be less efficient for small numbers of alleles (O(K*K) where K is the maximum ploidy), but much more efficient with large numbers of alleles (especially in terms of memory). This could be implemented as an optional method sg.identity_by_state(ds, method='matching').