l294265421 / alpaca-rlhf

Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat
https://88aeeb3aef5040507e.gradio.live/
MIT License
103 stars 13 forks source link

增大max_prompt_len和max_ans_len训练会出现非法的内存访问问题 #16

Open Luoxiaohei41 opened 7 months ago

Luoxiaohei41 commented 7 months ago

在进行step3时,对于512的长度训练时没问题的,但是只要增大长度机会报这个错误,在任意iter。有朋友遇到这个问题并解决吗?