Describe the bug
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图)
I am resume training from a checkpoint for model /internlm-xcomposer2-7b-chat, but met "Size Mismatch error"
[rank0]: File "/home/yerong2/local/miniconda3/envs/qw/lib/python3.11/site-packages/deepspeed/ops/adam/fused_adam.py", line 191, in step
[rank0]: multi_tensor_applier(self.multi_tensor_adam, self._dummy_overflow_buf, [g_32, p_32, m_32, v_32],
[rank0]: File "/home/yerong2/local/miniconda3/envs/qw/lib/python3.11/site-packages/deepspeed/ops/adam/multi_tensor_apply.py", line 17, in __call__
[rank0]: return op(self.chunk_size, noop_flag_buffer, tensor_lists, *args)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: RuntimeError: Size mismatch
Describe the bug What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图) I am resume training from a checkpoint for model
/internlm-xcomposer2-7b-chat
, but met "Size Mismatch error"Your hardware and system info Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等)
Additional context Add any other context about the problem here(在这里补充其他信息)