implement an efficient bpe function using stack

The original bpe function looks for the bigram with the smallest bpe_ranks value in a word then merge this bigram, and repeating this process in a while loop.

We use a stack to implement the process in a linear O(n) complexity, and the code is also much simpler than before.

See the performance comparison below: text is from https://www.reedbeta.com/blog/programmers-intro-to-unicode/ my_enc.encode represents the encode function using the original bpe fucntion, and my_enc.my_encode represents the encode function using the modified bpe function

openai / gpt-2

implement an efficient bpe function using stack #339