Why do you also prepend "GPTx" to the "User:"?

Hi, and thanks for sharing your great work! I have a question.

The key intuition behind C-RLFT in the implementation perspective is to apply Return-weighted behavior cloning and goal-conditioned RL setting, as far as I understand.

Since the action taken from the behavior policy (in the Openchat paper, it's either GPT-3.5 or GPT-4) matters most, I think it is more preferable to prepend "GPTx" condition only to "Assistant:" prefix, not to "User:" prefix.

My intuition is that even when the same user utterance is given, 2 different agents may behave differently and get different rewards for their respective actions. So I think the condition should only be attached to "Assistant:" prefix.

Or did you try this but the result didn't came out well..?

Thank you!

imoneoi / openchat

Why do you also prepend "GPTx" to the "User:"? #220