google / sentencepiece

Unsupervised text tokenizer for Neural Network-based text generation.
Apache License 2.0
10.07k stars 1.16k forks source link

Fixing issues with the normalizer.cc (typo, type safety, cast fucn) #1005

Closed Cassini-chris closed 4 months ago

Cassini-chris commented 4 months ago

1x fixed typo 1x Changed uint32 to uint32_t for consistency and type safety. 1x The original code uses const_cast<char> before reinterpreting it as uint32. Without const_cast, the compiler will treat the blob.data() as a pointer to constant data (since std::string is typically constant).