Open marksverdhei opened 2 years ago
We should try out an alternative for bert-insert, where we sample from the softmax of the logits instead of picking the highest every time. This can lead to some interesting results. I suggest top-k sampling to not make it too nonsense-prone either
I support this!
We should try out an alternative for bert-insert, where we sample from the softmax of the logits instead of picking the highest every time. This can lead to some interesting results. I suggest top-k sampling to not make it too nonsense-prone either