Closed beeevita closed 1 year ago
Hi, that's a great question. Integrating prior knowledge and reward-based training has long been an open problem in RL research.
Can you tell me more about your use case? For example, do you have some prompts that you wrote down yourself, that you think will be good guesses? Or have you already trained a prompt generation model that made some progress, and you want to train it more? With more info, I might be able to help you better.
On another note, the RL algorithm we used can also learn reward information from a pre-specified dataset, which may be relevant to what you are looking for. For more info, you can check out the original repo: https://github.com/HanGuo97/soft-Q-learning-for-text-generation.
I hope this helps. Let me know if you have other questions.
Thank you. The specific task is machine translation. I found it fluctuating sharply and hard to reach convergence. So I think injecting prior knowledge may be useful.
With prompted generation tasks, you will get high reward variance. I will be assuming you use GPT-2. Not sure if you have looked into our code for text style transfer, but here are some general strategies we found useful for stabilizing the reward signal:
Let me know if this makes sense. If you could provide more information, I may be able to help you better. You can also email me at mingkaid@cs.cmu.edu if it feels more comfortable
Hi there,
Is the prompt initialized randomly? I wonder whether the initial prompt tokens can be specified, which may help to speed up convergence.
Thanks for replying.