Closed salieri closed 1 year ago
@salieri could you provide a little more information for me please?
I need:
train_text_to_image.py
:) (e.g. is it diffusers?)Thanks!
A different example that should also exit non-zero:
Singularity> accelerate launch /not_a_file.py; echo "Exit code is: $?"
/anaconda/bin/python3.9: can't open file '/not_a_file.py': [Errno 2] No such file or directory
/anaconda/bin/python3.9: can't open file '/not_a_file.py': [Errno 2] No such file or directory
/anaconda/bin/python3.9: can't open file '/not_a_file.py': [Errno 2] No such file or directory
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 2) local_rank: 0 (pid: 223625) of binary: /anaconda/bin/python3.9
Exit code is: 0
cat /anaconda/lib/python3.9/site-packages/accelerate/__init__.py | grep version
__version__ = "0.14.0"
@muellerzr Sorry, missed your question! I noticed that on A100s running runpod/pytorch Docker image. But I believe I've seen it on my RTX4090/Windows setup too.
System Info
Information
Tasks
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
)Reproduction
Intentionally configure a batch size that is too big for your GPU. E.g.
Expected behavior