Multi-node multi-*PU training. This is required for really scaling our use of the data pipeline for big predictions and given the construction of the pipeline as it exists, we just need some library changes to ensure that we can utilise resources as they're available. This will track additional development required to ensure that we scale to the HPC capabilities in question.
The structure of the library facilitates some usage of distributed mechanisms, this is definitely not a CLI workflow. Some additional scripts are being added under icenet-pipeline for the moment
Description
Multi-node multi-*PU training. This is required for really scaling our use of the data pipeline for big predictions and given the construction of the pipeline as it exists, we just need some library changes to ensure that we can utilise resources as they're available. This will track additional development required to ensure that we scale to the HPC capabilities in question.