Create a more robust test framework

sd2k commented 1 year ago

The current tests of algorithm implementations are pretty ad-hoc; I basically took some datasets from the original papers/notebooks, ran them in R/Python, and copied the expected values into our tests. It'd be much better if we had a way to automatically generate the test cases and results somehow.

We probably don't need to go as far as running the R/Python algorithms every time, but we should at least have a script or notebook to generate the expected test results so we can update it as required.

shenxiangzhuang commented 3 months ago

Hi @sd2k, thank you for sharing the awesome augurs library firstly! There is a sugguestion about this issue. I think we could test the rust implementation by comparing the python binding's output with the original python implementation. I use this method in my little project bleuscore and it works fine.

In short, it use hypothesis library to do property-like testing, which will generate many test cases automatically to test the equality of the two implementations.

sd2k commented 1 month ago

@shenxiangzhuang Thanks for the links, and sorry for the delay in getting back to you, I was away for quite a lot of the summer.

That is a nice idea yeah - we could do something similar and run those tests in CI to make sure we get matching results, at least for algorithms with matching Python implementations. Part of the problem is that some of the algorithms don't perfectly match the Python implementations so we'd need some kind of acceptable tolerance in each case. We could also provide a way to run benchmarks for augurs implementations vs Python implementations which might be a good way to convince people to actually use this library 😅

I'm not really sure what to do about R though. It's a much bigger effort so maybe we should just stick with comparison vs Python libraries for now.

grafana / augurs

Create a more robust test framework #20