Closed JieDengsc closed 11 months ago
We just read the JSON and save the JSON for the given samples and do not generate any new sentences in the process. So I think you should check the saving command.
Thanks for your reply, I found out that it was really a saving problem.
Also, can I refer to your sh file configuration for “Train Pre-Experienced Model”?
Sure, I think 1000 samples for training 1 epoch should be a good starting configuration~
I'm using Chinese SFTdata for code execution. After the "pre_experience_selection.sh" file is executed, the "alpaca_data_pre.json" file is obtained, but all Chinese characters in the file are changed to \uxxxx. Therefore, the “Train Pre-Experienced Model” file cannot be executed.
Can you check whether “data_by_cluster” and “data_analysis” do not support Chinese?
Thank you.