taishan1994 / sentencepiece_chinese_bpe

使用sentencepiece中BPE训练中文词表,并在transformers中进行使用。
97 stars 14 forks source link

training efficiency #2

Open adogwangwang opened 5 months ago

adogwangwang commented 5 months ago

hello,thanks for your efforts, I have problems with training efficiency, how long time it takes to train your Tokenizer with your data? I have been training a tokenizer for a long time and it haven't finished.

taishan1994 commented 5 months ago

I haven't tested a lot of the text, you can look it up on its official github.