Closed abedini-arteriaai closed 10 months ago
Hello!
Hmm, I haven't seen this error before. It seems to occur when checkpointing during training. By default, this happens every 500 steps (see docs here).
Other users have experienced similar issues: https://stackoverflow.com/questions/64206070/pytorch-runtimeerror-enforce-fail-at-inline-container-cc209-file-not-fou and they attributed it to either:
~/.cache/torch/sentence_transformers/...
helps?Alternatively, you can set save_strategy="no"
to prevent the model from doing any checkpointing during training. That should help, although you might want to save the final model & doing so might still crash if something is corrupted.
Thank you for the tip, changing save_strategy
solved the issue - seems to be space related.
Hi Team,
I've been using SetFit with a variety of different datasets and base models and this has been a persistent issue. It happens during trainer.train().
The error is
RuntimeError: [enforce fail at inline_container.cc:471] . PytorchStreamWriter failed writing file data/158: file write failed
, I previously thought it could be a memory issue and have done the following to mitigate: 1) trying only 1 epoch, batch size 1, max_steps 1, eval_max_steps 1 2) reducing the dataset to only 100 samples 3) checked!df -Th
and there is plenty of memoryIs this a familiar error?