I think the main bottleneck in our training speed is the training data loader. This is probably because we are getting cache-misses when collecting the data at random indices. To speed this up, we will probably need to store the pre-shuffled data on disk.
I tried writing some code to do this, but I was running out of memory on the last step:
from itertools import product
from random import shuffle
import numpy as np
import xarray as xr
ds = xr.open_dataset("./data/processed/training.nc")
# construct the indices
x = range(len(ds.x))
y = range(len(ds.y))
z = range(len(ds.z))
time = range(len(ds.time))
indices = list(product(x, y, time))
shuffle(indices)
transposed = list(zip(*indices))
# construct xarray indexers following
# http://xarray.pydata.org/en/stable/indexing.html#more-advanced-indexing
dims = ['x', 'y', 'time']
indexers = {
dim: xr.DataArray(
np.array(index),
dims="sample",
coords={'sample': np.arange(len(indices))})
for dim, index in zip(dims, transposed)
}
# This step runs out of memory
shuffled_ds = ds.isel(**indexers)
To speed this up, we will probably need to do a couple of steps, with on-disk caching for each step:
Transpose the data (time, z, y, x) --> (time, y, x, z)
Reshape (time, y, x, z) --> (batch, time_and_next, z)
cc @sarenehan
I think the main bottleneck in our training speed is the training data loader. This is probably because we are getting cache-misses when collecting the data at random indices. To speed this up, we will probably need to store the pre-shuffled data on disk.
I tried writing some code to do this, but I was running out of memory on the last step:
To speed this up, we will probably need to do a couple of steps, with on-disk caching for each step:
(time, z, y, x) --> (time, y, x, z)
(time, y, x, z) --> (batch, time_and_next, z)