voidful / TextRL

Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)
MIT License
545 stars 60 forks source link

Text generation models generating repeated/duplicate text/sentences. #13

Closed tontan1998 closed 1 year ago

tontan1998 commented 1 year ago

Hello! Thank you for awesome project. I found some model that generating repeated/duplicate text/sentences with TextRL.

Can you add the code that remove repeated/duplicate text/sentences from GPT-2 model with temperature?

tontan1998 commented 1 year ago

https://github.com/huggingface/transformers/issues/1725

voidful commented 1 year ago

you can set repetition_penalty on actor, and adjust temperature, top_k and top_p to improve the sample result

actor = TextRLActor(env, model, tokenizer,
                    act_deterministically=False,  # select the max probability token for each step or not
                    temperature=1,                # temperature for sampling
                    compare_sample=2,             # num of sample to rank
                    top_k=0,                      # top k sampling
                    top_p=1.0,                    # top p sampling
                    repetition_penalty=2)         # repetition penalty from CTRL paper (https://arxiv.org/abs/1909.05858)
tontan1998 commented 1 year ago

Thank you!