Open harik68 opened 1 year ago
Same problem here, did you solve it at last?
cc @lewtun @philschmid
同样的问题,你终于解决了吗? deepspeed 0.10.0 错误信息: AssertionError: Check batch related parameters. train_batch_size is not equal to micro_batch_per_gpu gradient_acc_step world_size 64 != 8 1 1
xbinglzh
Solved it by using transformer==4.29.2. But I don't think this method applies to all situations where this problem occurs.
@pcuenca So, is this a transformers
issue? Downgrading to transformer==4.29.2 did not help in my case.
what was the problemetic version of transformers?
I am facing some issues whe using Deep Speed for fine tuning StarCoder Model. I am exactly following the steps mentioned in this article Creating a Coding Assistant with StarCoder (section Fine-tuning StarCoder with DeepSpeed ZeRO-3). However I am getting the error “AssertionError: Check batch related parameters. train_batch_size is not equal to micro_batch_per_gpu gradient_acc_step world_size 256 != 4 8 1”. I did some research on this on Google and found this link explaining the reason [BUG] batch_size check failed with zero 2 (deepspeed v0.9.0) · Issue #3228 · microsoft/DeepSpeed · GitHub However even if I use the version of deepspeed mentioned in this article as working (v 0.9.0) I am getting the same error. I tried different versions of deepspeed and accelerate but couldn’t fix the issue. Any one has any suggestions? Thanks in advance.