Adding the official implementation of TRIE tokenizer for 20x speed up. 修改Tokenizer实现20x加速

yuunnn-w / RWKV_Pytorch

This is an inference framework for the RWKV large language model implemented purely in native PyTorch. The official native implementation is overly complex and lacks extensibility. Let's join the flexible PyTorch ecosystem and open-source it together!

GNU General Public License v3.0

97 stars 7 forks source link

Adding the official implementation of TRIE tokenizer for 20x speed up. 修改Tokenizer实现20x加速 #4

Closed jiamingkong closed 3 months ago

jiamingkong commented 3 months ago

Hi, I have migrated the official TRIE_TOKENIZER from RWKV-LM to this repo for the TRIE tokenizer. The official trie tokenizer parses approximately 20x faster. The test code is in tokenizer_benchmark.py.

test_tokenizer1 took 9.673033475875854 seconds test_tokenizer2 took 0.4087204933166504 seconds