mingkaid / rl-prompt

Accompanying repo for the RLPrompt paper
MIT License
286 stars 52 forks source link

Repeating tokens in optimized prompt #45

Open AMJasser opened 1 month ago

AMJasser commented 1 month ago

Hello there, I am working on an application of your work in another setting that is not related to text style transfer or classification. During evaluation, the model almost always gives repeating tokens like ['Private', 'Private', 'Private', 'Private', 'Private', 'Private'] or ['Policy', 'Policy', 'Policy', 'Policy', 'Policy', 'Policy']. How can I improve on the performance model? I'd love to get your expert insights on important hyperparameters I can play with to achieve better results.