mllam / neural-lam

Neural Weather Prediction for Limited Area Modeling
MIT License
102 stars 37 forks source link

Multi-GPU training #1

Closed joeloskarsson closed 11 months ago

joeloskarsson commented 11 months ago

I realized that multi-GPU training is currently broken. Luckily I believe this should be a simple fix, just making sure that logging + the storage of tensors in model classes conforms to the lightning setup properly.

joeloskarsson commented 11 months ago

Not as simple of a fix as I originally thought, but this is fixed with commit 89a4c63370201c9ea1a5f04d4cf1e5e75b7cc83e. Implementation should now work on CPU, single-GPU and multi-GPU.

A couple things to keep in mind from this fix: