[FEA] Automate distributed ML pytests

Currently, cuml does not have CI resources for multi-gpu testing. Though we have been told it’s possible, the resource constraints may initially limit this to only 2 gpus.

Recently, a cuml bug was found that appears to have been introduced several months ago, and required more than 2 gpus to present itself. Cuml relies on several libraries under active development, creating the need for more frequent verification in multi-gpu (and eventually multi-node) environments. We should be testing against the bleeding edge versions of these libraries so we can find breaking updates early.

Ideally, we would be executing multi-gpu and multi-node pytests automatically, at least once daily, if not more often. These tests should also be executed against the bleeding edge versions of Dask and UCX-py.

We should schedule these as cronjobs on a DGX until multi-gpu CI is able to support >2 GPUs.

rapidsai / cuml

[FEA] Automate distributed ML pytests #1910