Open hitesh-ag1 opened 11 months ago
cc @sanchit-gandhi Any help would be greatly appreciated!
Happening to me as well.
Hey @hitesh-ag1, sorry for the late reply here. Could you confirm that you're using one-to-one the same code as with single-GPU fine-tuning? Could you also provide the full stack trace for the error that you're getting? For interest, there's a multi-gpu example for Whisper fine-tuning that you can check out in the Transformers library.
Hi, I am trying to finetune Whisper according to the blog post here. The finetuning works great in a single GPU scenario, however, fails with multi GPU instances. While executing
trainer.train()
, multi GPU instances returnBus error (core dumped).
I am working on g5.12xlarge instance for multi GPU on AWS with AMI ID: ami-071323fe2bf59945b on Ubuntu. I would appreciate any guidance or suggestions to resolve this issue.