The SentencePiece tokenizer was disabled by default to reduce the binary size of the built library, while it causes some error when users expect to use the SentencePiece tokenizer.
This PR enables it by default. And we will need to manually disable it if we need to reduce its binary size.
This PR bumps the 3rdparty/tokenizer-cpp.
The SentencePiece tokenizer was disabled by default to reduce the binary size of the built library, while it causes some error when users expect to use the SentencePiece tokenizer.
This PR enables it by default. And we will need to manually disable it if we need to reduce its binary size.