Fully distributed training

icenet-ai / icenet

The icenet library is a pip installable python package containing the commands and code you need to produce forecasts

MIT License

20 stars 6 forks source link

Fully distributed training #252

Open JimCircadian opened 3 months ago

JimCircadian commented 3 months ago

Description

Multi-node multi-*PU training. This is required for really scaling our use of the data pipeline for big predictions and given the construction of the pipeline as it exists, we just need some library changes to ensure that we can utilise resources as they're available. This will track additional development required to ensure that we scale to the HPC capabilities in question.

JimCircadian commented 3 months ago

The structure of the library facilitates some usage of distributed mechanisms, this is definitely not a CLI workflow. Some additional scripts are being added under icenet-pipeline for the moment