Closed KepingYan closed 11 months ago
@delock, can you please help?
@delock, can you please help?
The direct reason is kv_n_heads
and d_model
needs to be added to the list in tensor sharded on https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/module_inject/auto_tp.py#L387 . But I still see the result is not correct after the fix. So there are some other issues with MPT, probably due to remote modeling code change. Needs further investigation.
Thanks @sywangyi !
@KepingYan can you verify whether https://github.com/microsoft/DeepSpeed/pull/4787 fix the issue? Thanks!
Describe the bug When I ran mpt model on the CPU, I encountered the following error.
To Reproduce run-mpt-ds.py
run-mpt.sh
package version
ds_report output
System info (please complete the following information):