Filtering in token sampling process

This is not a problem, but rather a discovery.

I was working on a Japanese version of XLNet recently, due to lack of training data(nothing more than JPNWiki) and complexity of the language itself, the model was never really good in terms of in-sample and heldout perplexity. But anyways I decided to give it a try on language generation.

The discovery is that, my model will frequently try to generate an eod token and skip to another completely irrelevant topic , often end up making up a non-existent wikipedia article. But if I purge the activation of eod token in predicted logits before sampling process, it will never be able to generate that token and end the topic it was given.

I also found purging activation corresponding to bad tokens(those that tend to make anything behind it catastrophic gibberish) will also largely help the quality of the generated text.

Even though the model was never good in the first place, I still managed to improve the generated text from complete garbage to acceptable text that reads as if it was written by someone drunk.

rusiaaman / XLnet-gen

Filtering in token sampling process #9