A bug in the implementation of the top-p sampling

lucidrains / PaLM-rlhf-pytorch

Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM

MIT License

7.7k stars 668 forks source link

Open allblueJT opened 1 month ago

allblueJT commented 1 month ago

Using the sorted indices to index the sorted indices does not make sense. I think it may be return logits.scatter(1, sorted_indices, sorted_logits)