Closed xuhaike closed 3 months ago
Thanks for contributing these! I have one ask--can you ensure that the random number generation is deterministic by giving a seed using random.seed()? It would be good if the Python scripts always produced exactly the same runbook.
I also noticed that these runbook names aren't incorporated into the options for computing recall in data_export.py
. Can you add those also?
Created three new runbooks on msturing-10m, wikipedia-35m, and msmarco-100m datasets. Please see the yaml files and the python scripts to generate them in folder neurips23/streaming