sgkit-dev / sgkit

Scalable genetics toolkit
https://sgkit-dev.github.io/sgkit
Apache License 2.0
217 stars 32 forks source link

Option to calculate IBS probabilities via allele matching #1227

Closed timothymillar closed 5 days ago

timothymillar commented 3 weeks ago

The identity_by_state method calculates mean probabilities of identity by state (IBS) from call_allele_freqeuncies. The current implementation is fairly efficient when the alleles dimension is small, but not when the alleles dimension is large. An alternative approach is to directly compare allelic states within the call_genotype array. This is likely to be less efficient for small numbers of alleles (O(K*K) where K is the maximum ploidy), but much more efficient with large numbers of alleles (especially in terms of memory). This could be implemented as an optional method sg.identity_by_state(ds, method='matching').