Closed FayeXXX closed 11 months ago
Hi there,
I am closing this issue now because it is research discussion. Feel free to follow up if you'd like to discuss further.
Thank you for your reply. I am working with the TST on yelp and my own dataset, firstly I try to reproduce the results in your paper. During my experiment, the reward increases but the result is still bad. I notice that in your paper you shape the reward from [0,1] to [-20,80], but in module_helpers.py line34-37, that is reward_shaping_old_min: float = 0 reward_shaping_old_max: float = 100 reward_shaping_new_min: float = -10 reward_shaping_new_max: float = 10 Maybe that inconsistent scale has the bad effect on the final result?
BTW, I'm wondering what are the hyper parameters I should tune to get better results. Could you please share a parameter list for tuning?
I am interested in your code, and thank you for your nice work. I have several questions after I run your code:
The loss is around 7000 to 12000, and the curve didn't converge. And I check the output prompts and find that there are some tokens repeating several times in the generated prompts. Then I use these prompts as the input for test, the result is terrible.
I guess there might be some mistakes during the training process, but I have no idea how to fix it. I have tried to change the hyper-parameters, but that didn't work.
I don't know why BERTscores are set to 0 continuously. As the BERTscore is a part of the reward, I'm wondering that may be a reason why I can't get the expected result.
Your reply will be greatly appreciated. Thank you.