Why the trained model does not produce the answer provided in the training data?

I followed the format C-RLFT provided in readme, and created this data: {"items":[{"role":"user","content":"Where is PingHu?","weight":0.0},{"role":"assistant","content":"PingHu is near Shanghai.","weight":1.0}],"condition":"GPT4","system":""} {"items":[{"role":"user","content":"Where is PingHu?","weight":0.0},{"role":"assistant","content":"I don't know.","weight":0.1}],"condition":"GPT3","system":""}

I trained the model for 20 epochs and saved every 2, but all the models that I tested all produced the answer from the original model (imone/Mistral_7B_with_EOT_token). No trained model gave the expected answer: PingHu is near Shanghai.

imoneoi / openchat

Why the trained model does not produce the answer provided in the training data? #188