XLNet text generation ability : inference is slow

astariul commented 5 years ago

I compared the inference time for generating text with the given example script between XLNet & GPT-2, on CPU.

To generate 100 tokens, XLNet takes 3m22s while GPT-2 takes 14s. And it grows exponentially : for 500 tokens, XLNet takes 51m46s while GPT-2 takes 2m52s.

Due to bidirectionality of the model, each tokens' attention should be computed again to relate to the newly generated token.

To reduce the time needed, we should allow the model to use unidirectional attention over generated tokens (even if it means that some older tokens will not see some newly generated tokens, i.e. reducing bidirectionality).

According to the original post of Aman Rusia, doing so greatly decrease the quality of text.

However the post was updated as it was a mistake in the code. It seems fine to generate tokens with unidirectional attention. Please refer to this issue

astariul commented 5 years ago

I tried it, but text quality is lowered a lot and inference time does not change at all.

I simply changed perm_mask to be 0 over initial context and 1 over generated tokens.

Input :

In Seoul, you can do a lot of things ! For example you can

Generated text with full bidirectionality :

buy grocery stores and restaurants, or even buy liquor, tobacco, etc. Then you can go to the mall. Then you can visit shopping mall. Then you can go to the university, then you can visit an outdoor pool. You can visit the cinema. You can visit art galleries. Then you can visit a garden. Etc. etc. etc. After all, if you can buy items and enjoy them, then yes, you can enjoy them in Seoul. It is that simple.

Generated text with bidirectionality over context tokens, and unidirectionality over generated tokens :

buy tons Free do hotel on you whichT Seoul, list and do you coffee non can many of you sit- shopping People you river boatou. and Koreans in long you into graduate train/ by teacher college c people there ho sister formst to in city plain daughtera kayak cat.: years World home. still home later N will plan yearses street his looks a marriage different by tell it too stunning out to what ice by person a, people a bag.

Why is it that bad ?

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

patrickvonplaten commented 4 years ago

Hi @Colanim,

Thanks for the issue - sorry that we overlooked it!

I will take a closer look into this. GPT2 uses key value state caching whet doing generation. Not sure whether XLNet does something similar. Will see if it'd be easy to add or not!

patrickvonplaten commented 4 years ago

Sorry to answer that late. XLNet is known to be rather slow for text generation due to the needed padding to get it started.

XLNet uses mems which is similar to past to have a longer memory span. Since the quality seems to degrade much when applying your suggestion, I don't think trying to add a XLNet enhancement for generation is of high priority at the moment...Sorry! But feel free to open a PR if you have a good solution :-)

huggingface / transformers

XLNet text generation ability : inference is slow #789