huggingface / alignment-handbook

Robust recipes to align language models with human and AI preferences
https://huggingface.co/HuggingFaceH4
Apache License 2.0
4.18k stars 354 forks source link

Question about sft with deepspeed #170

Open XXares opened 1 month ago

XXares commented 1 month ago

Hello , I face a problem when training mistral model in a sft way with deepspeed zero3 config, here is the error information:

Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass

RuntimeError: The size of tensor a (0) must match the size of tensor b (14336) at non-singleton dimension 1

Which problem occurs in loss computation process and do you have any suggestions about this problem?

mianzhang commented 3 weeks ago

same question here.