waterhorse1 / ChessGPT

(NeurIPS 2023) ChessGPT - Bridging Policy Learning and Language Modeling
https://arxiv.org/abs/2306.09200
Apache License 2.0
97 stars 7 forks source link

[Training detail] About training order #3

Open Inch-Z opened 12 months ago

Inch-Z commented 12 months ago

For the Legal Move task, I can't reproduce experimental results. I believe that the issue might be related to the order of training, so I would like to ask about the training details of the ChessGPT model, specifically regarding the order of tasks.

ziyan-wang98 commented 12 months ago

Hi! I’m reaching out to better understand your concerns about this issue. Firstly, when you mention the “legal move” task, are you referring to the results listed under the Bigbench State Tracking in Chess in Table 1 of our paper? Could you specify which exact task results are not aligning with your reproduction, and by what margin they differ? Secondly, it would be very helpful to know more about how you do the reproduction. Did you perform the evaluation using the ChessGPT-v1 model provided on HuggingFace, or did you train a new model utilizing the dataset according to the procedures described in our paper? This information will be vital in assisting you further.

Inch-Z commented 12 months ago

I am using Hugging Face's data for training, specifically OpenLLAMA3B. I'm curious about the sequence in which you use the data or the order of training tasks. Thank you very much for your response.

waterhorse1 commented 12 months ago

Hi @Inch-Z,

Our training order is (1) We firstly continue pretraining over https://huggingface.co/datasets/Waterhorse/chess_data/tree/main/chessgpt_data to get our chessgpt-base, but note that it's normal if you cannot completely reproduce our result, because the data we share is not exactly the same with our pretraining data. Because of some legal issues, we do not share some of them including blog, book and some of our annotated book. (2) Then we conduct SFT tuning over https://huggingface.co/datasets/Waterhorse/chess_data/tree/main/chessgpt_sft_data, we release all our SFT data.

If you focus more on legal move prediction task, I would recommend you to especially look at https://huggingface.co/datasets/Waterhorse/chess_data/tree/main/chessgpt_data/chess_modeling, where we create several chess modeling task (specially described in our paper appendix), including the legal move prediction task.

Best, Xidong

Inch-Z commented 10 months ago

Hello, we have encountered a new problem about how the Elo score in the paper is calculated. At the same time, we are curious whether the model can play against humans and output the next step. If possible, we are more curious about the prompt format.

Thanks

waterhorse1 commented 10 months ago

Hi, in fact we do not directly calculate the Elo rating because we find there might be some issues with the policy training (see our description in our section 5.3). Currently, we are working on refining the game data and doing a round of retraining on the new dataset with a bigger and more advanced model. Also we will include Elo rating calculation this time. Please stay tuned and we will have some results by the end of this month.