Open yeongsang2 opened 1 year ago
I guess evaluate_response_for_RL() could be used as a workaround. This function tries to use OpenAI's text-davinci-003 to return a reward number.
I guess evaluate_response_for_RL() could be used as a workaround. This function tries to use OpenAI's text-davinci-003 to return a reward number.
I think so too, thanks for the reply.
3.3 Training Loop