The __getitem__ of the datasets can not handle a namedtuple when there are multiple parallel workers. The parallel workers are needed to reach high data loading speeds for powerful compute nodes.
This is a fundamental pytorch issue: because each worker instantiates a Dataset object, and the namedtuple is instantiated in each one, the parallel workers can't collate the batches in a "custom" named tuple.
Potential workarounds:
write a custom collate_fn for the DataLoader which converts the entire batch into a namedtuple.
each batch returns a dictionary, as in {device_name: value_tensor}
The
__getitem__
of the datasets can not handle a namedtuple when there are multiple parallel workers. The parallel workers are needed to reach high data loading speeds for powerful compute nodes. This is a fundamental pytorch issue: because each worker instantiates aDataset
object, and the namedtuple is instantiated in each one, the parallel workers can't collate the batches in a "custom" named tuple.Potential workarounds:
collate_fn
for the DataLoader which converts the entire batch into a namedtuple.{device_name: value_tensor}