shawnlaffan / biodiverse

A tool for the spatial analysis of diversity
http://shawnlaffan.github.io/biodiverse/
GNU General Public License v3.0
74 stars 19 forks source link

analyses - calculate SHA digests of labels per group #870

Closed shawnlaffan closed 10 months ago

shawnlaffan commented 1 year ago

A possible optimisation for calculation of pairwise matrices using turnover indices is to generate the SHA digest for the label sets in each group.

Any groups with the same SHA digest will contain the same labels and therefore have zero turnover.

We can also track the pairs of SHA sums to detect cases where the same turnover score will be obtained.

shawnlaffan commented 10 months ago

Experiments with the North American data set from Mishler et al. (2020) showed very few groups had the same SHA sum. While smaller data sets might have similar overlaps, there will be an even lower overlap for those that use sample counts. This adds complexity to the implementation for what might be only marginal gains in speed.

Marking as a wontfix.