lucidrains / routing-transformer

Fully featured implementation of Routing Transformer
MIT License
282 stars 29 forks source link

Fix top_p to define threshold similarly to top_k and not garble output. #2

Closed tomweingarten closed 4 years ago

tomweingarten commented 4 years ago

Fixes #1 . This change switches the way threshold is defined so that 0.9 means accept the top 10% of probabilities, rather than the top 90%. It also replaces the second gather with a scatter. The second gather was shuffling the results in a meaningless way.

lucidrains commented 4 years ago

@tomweingarten yes, you are right Tom! thank you for catching this!