This PR makes several refactoring changes to clean up the pipeline for the chatbot example. The TL;DR is basically that this makes it easier to (1) run zeno-build in parallel on multiple machines, and (2) generate reports from previously finished runs.
There are many fine details such as introducing a locking mechanism to prevent the same experiments from being run twice in parallel, automatically loading the prediction files at the end of the training run, etc.
Description
This PR makes several refactoring changes to clean up the pipeline for the chatbot example. The TL;DR is basically that this makes it easier to (1) run zeno-build in parallel on multiple machines, and (2) generate reports from previously finished runs.
There are many fine details such as introducing a locking mechanism to prevent the same experiments from being run twice in parallel, automatically loading the prediction files at the end of the training run, etc.
References
Blocked by