hello friends Has anyone successfully reproduced this paper? I encountered some difficulties in the process of reproducing the paper, and I directly used the model parameters provided by the author. When strict is set to True in m, u = model.load_state_dict(sd, strict=True), the model cannot be loaded and cannot run through the reasoning process. I also trained it myself and found that the saved model reached 8.2G. Does anyone have the same problem, hope to get your help, thank you
hello friends Has anyone successfully reproduced this paper? I encountered some difficulties in the process of reproducing the paper, and I directly used the model parameters provided by the author. When strict is set to True in m, u = model.load_state_dict(sd, strict=True), the model cannot be loaded and cannot run through the reasoning process. I also trained it myself and found that the saved model reached 8.2G. Does anyone have the same problem, hope to get your help, thank you