Faster DataLoader class

cc @sarenehan

I think the main bottleneck in our training speed is the training data loader. This is probably because we are getting cache-misses when collecting the data at random indices. To speed this up, we will probably need to store the pre-shuffled data on disk.

I tried writing some code to do this, but I was running out of memory on the last step:

from itertools import product
from random import shuffle

import numpy as np
import xarray as xr

ds = xr.open_dataset("./data/processed/training.nc")

# construct the indices
x = range(len(ds.x))
y = range(len(ds.y))
z = range(len(ds.z))
time = range(len(ds.time))

indices = list(product(x, y, time))
shuffle(indices)
transposed = list(zip(*indices))

# construct xarray indexers following 
# http://xarray.pydata.org/en/stable/indexing.html#more-advanced-indexing
dims = ['x', 'y', 'time']
indexers = {
    dim: xr.DataArray(
        np.array(index),
        dims="sample",
        coords={'sample': np.arange(len(indices))})
    for dim, index in zip(dims, transposed)
}

# This step runs out of memory
shuffled_ds = ds.isel(**indexers)

To speed this up, we will probably need to do a couple of steps, with on-disk caching for each step:

Transpose the data (time, z, y, x) --> (time, y, x, z)
Reshape (time, y, x, z) --> (batch, time_and_next, z)
Shuffle along batch dimension.

nbren12 / uwnet

Faster DataLoader class #57