Need to fix overfitting by training on more data. Train from Zarr. If it's too slow then load entire Zarr into memory at start of training? Or download their numpy dataset.
[x] Implement first version of code that trains from Zarr. Seems to work. Gets about 3.4 it/s. GPU is working hard.
[x] Stop wandb logging so much damn data! Just keep the best model!
[x] Try on donatello (after copying the Satellite Zarr to donatello's 4TB SSD)
[x] #60
[ ] Try reducing the number of processes Dask uses per pytorch worker, so we can have more pytorch workers, without each worker swamping the CPU!
[ ] Load async (i.e. load from disk while training using threads)
[x] Try loading more days per epoch.
[ ] Two train DataLoaders: One from Zarr; and another from the pre-prepared v15 data
[x] convert to float32 on the fly, to reduce ram usage
[ ] If this doesn't work at all then:
[ ] Load many more days per epoch, and then do more examples per epoch.
[ ] Save pre-prepared NetCDF files with many random days of data.
Need to fix overfitting by training on more data. Train from Zarr. If it's too slow then load entire Zarr into memory at start of training? Or download their numpy dataset.
donatello
(after copying the Satellite Zarr to donatello's 4TB SSD)