open-contracting / cardinal-rs

Measure red flags and procurement indicators using OCDS data
https://cardinal.readthedocs.io
MIT License
9 stars 3 forks source link

Have another look through statistics crates #19

Closed jpmckinney closed 1 year ago

jpmckinney commented 1 year ago

Using statrs currently (most popular). There might be some newer crates that meet our needs better.

medians has medinfof64. It's a single-author library (along with rstats), and has less usage.

qsv-stats

qsv-stats performs a sort - O(n log n) - to calculate quartiles. statrs uses a selection algorithm – O(n).

For DR bid ratios, numpy calculates 0.25580327. qsv-stats got 0.2560257847899094 (0.00022 diff). statrs got 0.2559516146277174 (0.00014 diff). In other words, no major difference.


Also looking at https://docs.rs/watermill/latest/watermill/ for online statistics.

ADR: watermill's quartile calculation is non-deterministic. I think that means we should not use that feature, as I expect it will be confusing to users to get different results (more or fewer flags) on different runs.

jpmckinney commented 1 year ago

I think we're okay with statrs, but can revisit.