For a dataset that is loaded on multiple cpu cores, sometimes the fork method creates problems (with polars for example) and the spawn method is more adapted.
Your contribution
I could do a PR. A fix could be to add one more parameter to Trainer and pass it to the Dataloader down the line.
Feature request
In Huggingface Trainer, allow to pass the multiprocessing context : https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader
Motivation
For a dataset that is loaded on multiple cpu cores, sometimes the fork method creates problems (with polars for example) and the spawn method is more adapted.
Your contribution
I could do a PR. A fix could be to add one more parameter to Trainer and pass it to the Dataloader down the line.