A question about prompt initialization

beeevita commented 1 year ago

Hi there,

Is the prompt initialized randomly? I wonder whether the initial prompt tokens can be specified, which may help to speed up convergence.

Thanks for replying.

mingkaid commented 1 year ago

Hi, that's a great question. Integrating prior knowledge and reward-based training has long been an open problem in RL research.

Can you tell me more about your use case? For example, do you have some prompts that you wrote down yourself, that you think will be good guesses? Or have you already trained a prompt generation model that made some progress, and you want to train it more? With more info, I might be able to help you better.

On another note, the RL algorithm we used can also learn reward information from a pre-specified dataset, which may be relevant to what you are looking for. For more info, you can check out the original repo: https://github.com/HanGuo97/soft-Q-learning-for-text-generation.

I hope this helps. Let me know if you have other questions.

beeevita commented 1 year ago

Thank you. The specific task is machine translation. I found it fluctuating sharply and hard to reach convergence. So I think injecting prior knowledge may be useful.

mingkaid commented 1 year ago

With prompted generation tasks, you will get high reward variance. I will be assuming you use GPT-2. Not sure if you have looked into our code for text style transfer, but here are some general strategies we found useful for stabilizing the reward signal:

For the same input, sample multiple outputs and select based on some metric. This is a well-known practice in modern controllable generation using LLMs. This paper did it for machine translation, but with a supervised model
For the same prompt, sample the reward multiple times and average
For the same input, compute the reward z-score across several prompts
For tasks like translation which has well-known manual prompts, you can add prompts like "translate to french" to the optimized prompt

Let me know if this makes sense. If you could provide more information, I may be able to help you better. You can also email me at mingkaid@cs.cmu.edu if it feels more comfortable

mingkaid / rl-prompt

A question about prompt initialization #22