making ongoing evaluation more streamlined

Currently, when a new algorithm is submitted, only @harsha-simhadri can run it since he has the existing results files from all other algorithms on the standard Azure machine (in order to produce the updated plots). This is not scalable.

I opened this issue to collect different ideas to let others also evaluate algorithms, and also consider fully automatic evaluation.

Idea 1: Let others evaluate new algorithms on the standard hardware, and update the ongoing leaderboard. This may require simplifying the mechanism for generating plots, which now requires full results files for all algorithms (large hdf5 files).

Idea 2: fully automatic evaluation - when someone submits a PR, it is evaluated automatically using the CI or some other method. This is not working today (and even the small unit tests are flakey since there is variability in the type of the machine that is running the CI).

@maumueller @sourcesync I know you also thought about this. Any ideas?

harsha-simhadri / big-ann-benchmarks

making ongoing evaluation more streamlined #266