I followed the format C-RLFT provided in readme, and created this data:
{"items":[{"role":"user","content":"Where is PingHu?","weight":0.0},{"role":"assistant","content":"PingHu is near Shanghai.","weight":1.0}],"condition":"GPT4","system":""}
{"items":[{"role":"user","content":"Where is PingHu?","weight":0.0},{"role":"assistant","content":"I don't know.","weight":0.1}],"condition":"GPT3","system":""}
I trained the model for 20 epochs and saved every 2, but all the models that I tested all produced the answer from
the original model (imone/Mistral_7B_with_EOT_token). No trained model gave the expected answer: PingHu is near Shanghai.
I followed the format C-RLFT provided in readme, and created this data: {"items":[{"role":"user","content":"Where is PingHu?","weight":0.0},{"role":"assistant","content":"PingHu is near Shanghai.","weight":1.0}],"condition":"GPT4","system":""} {"items":[{"role":"user","content":"Where is PingHu?","weight":0.0},{"role":"assistant","content":"I don't know.","weight":0.1}],"condition":"GPT3","system":""}
I trained the model for 20 epochs and saved every 2, but all the models that I tested all produced the answer from the original model (imone/Mistral_7B_with_EOT_token). No trained model gave the expected answer: PingHu is near Shanghai.