imoneoi / openchat

OpenChat: Advancing Open-source Language Models with Imperfect Data
https://openchat.team
Apache License 2.0
5.26k stars 399 forks source link

Why the trained model does not produce the answer provided in the training data? #188

Open houghtonweihu opened 8 months ago

houghtonweihu commented 8 months ago

I followed the format C-RLFT provided in readme, and created this data: {"items":[{"role":"user","content":"Where is PingHu?","weight":0.0},{"role":"assistant","content":"PingHu is near Shanghai.","weight":1.0}],"condition":"GPT4","system":""} {"items":[{"role":"user","content":"Where is PingHu?","weight":0.0},{"role":"assistant","content":"I don't know.","weight":0.1}],"condition":"GPT3","system":""}

I trained the model for 20 epochs and saved every 2, but all the models that I tested all produced the answer from the original model (imone/Mistral_7B_with_EOT_token). No trained model gave the expected answer: PingHu is near Shanghai.