Closed qubvel closed 2 months ago
cc @SunMarc @muellerzr
(The right muellerzr, which is how I missed this)
@muellerzr After some investigation, I found that a leak happens if dataloader_persistent_workers=True
, and if dataloader_persistent_workers=False
there is no leak actually (that parameter was missed in the reproducing script, I added it now). Probably its not even related to compute_metrics
.
@qubvel are you setting pin_memory=True
?
That’s usually required, and should’ve thrown a warning
No, I didn't set pin_memory=True
and I didn't notice any warning, probably because the script is too verbose on start..
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
System Info
transformers
version: 4.41.0.dev0Who can help?
@pacman100 @muellerzr
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
RAM usage increases after each evaluation stage. Training phase goes fine, but each evaluation stage with
compute_metrics
function increases RAM usage. Ifcompute_metrics
is not provided there is no leak. I used very simplecompute_metrics
function that returns constant:Here is the simplified script I am running to reproduce memory leaks. It makes just 1 step of training and then goes to the validation stage.
Expected behavior
Any ideas on how to identify what caused the memory leak and how that could be fixed?