After writing the .beton file, I at first tried creating a loader using the same pipeline
float_pipeline = [NDArrayDecoder(), ToTensor()]
# Pipeline for each data field
pipelines = {
'data': float_pipeline,
'target': float_pipeline,
'vol': float_pipeline,
'temp': float_pipeline
}
loader = Loader(ffcv_file, batch_size=64, num_workers=8,
order=OrderOption.RANDOM, pipelines=pipelines)
data,target,vol,temp = next(iter(loader))
However, all of the variables have the shape of the smallest array, in this case temp (i.e. data is shape (Nbatch,1), where it should be (Nbatch,1,31,32)).
When I create separate pipelines for each variable which is a different size, then things come out correctly:
float_pipeline = [NDArrayDecoder(), ToTensor()]
vol_pipeline = [NDArrayDecoder(), ToTensor()]
T_pipeline = [NDArrayDecoder(), ToTensor()]
# Pipeline for each data field
pipelines = {
'data': float_pipeline,
'target': float_pipeline,
'vol': vol_pipeline,
'temp': T_pipeline
}
I don't think this is a bug, but as a new user it surprised me, perhaps its documented but if not perhaps should be. I define a dataset like so:
After writing the .beton file, I at first tried creating a loader using the same pipeline
However, all of the variables have the shape of the smallest array, in this case
temp
(i.e.data
is shape(Nbatch,1)
, where it should be(Nbatch,1,31,32)
).When I create separate pipelines for each variable which is a different size, then things come out correctly: