Issue with loading nested array type from spark DF to torch

Hi, I'm trying to train an LSTM with Pytorch on a timeseries dataset which I have in spake. The spark dataframe is constructes such that every row contains a training sample and label. The training data is inside my features column which has a nested array of floats with size (lookback_window, number_of_features) the label column is a simple scalar.

training_df.schema = 
StructType([
   StructField('features', ArrayType(ArrayType(FloatType(), True), True), False), 
   StructField('label', DoubleType(), True)
])

When I try iterating over the make_torch_dataloader I get for every sample a dictionary with only labels, the are features are missing.

Any idea on the issue, or how I should structure my features data such that this is working?

uber / petastorm

Issue with loading nested array type from spark DF to torch #797