Currently, IterableDataset doesn't support setting num_worker in .map(), which results in slow processing here. Could we add support for it? As .map() can be run in the batch fashion (e.g., batch_size is default to 1000 in datasets), it seems to be doable for IterableDataset as the regular Dataset.
I was curious about the same - since map is applied on the fly I was assuming that setting num_workers>1 in DataLoader would effectively do the map in parallel, have you tried that?
Feature request
Currently, IterableDataset doesn't support setting num_worker in .map(), which results in slow processing here. Could we add support for it? As .map() can be run in the batch fashion (e.g., batch_size is default to 1000 in datasets), it seems to be doable for IterableDataset as the regular Dataset.
Motivation
Improving data processing efficiency
Your contribution
Testing