Closed kmchiti closed 1 month ago
To clarify the problem more, the issue is that we currently don't support when the number of workers > 1.
(There is also some issues with datasets.Dataset
in general)
For right now this isn't supported/implemented yet
We'll need a different and custom adjust_state_dict_for_prefetch
func, which we do support working with if you want to try playing with it in the interim.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
System Info
Information
Tasks
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
)Reproduction
Code Example:
Error:
When running the script on multiple GPUs, I encountered the following error:
Expected behavior
The script should run without errors on multiple GPUs, similar to the behavior observed on a single GPU and without multiprocessing. The issue seems related to accessing
_sampler_iter_yielded
from dl_state_dict. Given that dl_state_dict contains _sampler_iter_yielded under _main_snapshot and not directly as an attribute, the code should correctly navigate the dictionary structure to access this value.