Now that we have an example of how to benchmark the throughput and identify bottlenecks in the mila-docs, the research project template should also make this easy to do.
[ ] Add an example experiment configuration and accompanying notebook, that use the pytorch profiler and does the same kind of profiling as in the example, but using the template
[ ] Add an example of a sweep over some parameters, with the training throughput as the metric, and using different kinds of GPUs.
[ ] Create a wandb report with the throughput comparison between the different GPU types.
[ ] If done after DRAC support, also include a comparison between Mila/DRAC clusters. (For example, the optimal num_workers might be greater in DRAC due to the very slow $SCRATCH filesystems, could be interesting to take a look at that).
Should be completed after https://github.com/mila-iqia/mila-docs/issues/247
Now that we have an example of how to benchmark the throughput and identify bottlenecks in the mila-docs, the research project template should also make this easy to do.