Repeating tokens in optimized prompt

Hello there, I am working on an application of your work in another setting that is not related to text style transfer or classification. During evaluation, the model almost always gives repeating tokens like ['Private', 'Private', 'Private', 'Private', 'Private', 'Private'] or ['Policy', 'Policy', 'Policy', 'Policy', 'Policy', 'Policy']. How can I improve on the performance model? I'd love to get your expert insights on important hyperparameters I can play with to achieve better results.

mingkaid / rl-prompt

Repeating tokens in optimized prompt #45