google / seqio

Task-based datasets, preprocessing, and evaluation for sequence models.
Apache License 2.0
556 stars 58 forks source link

Implement eq in SentencePieceModel based on __getstate__ #764

Closed copybara-service[bot] closed 4 days ago

copybara-service[bot] commented 1 week ago

Implement eq in SentencePieceModel based on getstate

This allows comparing two vocabularies without loading the model. Currently, eq is implemented by comparing the md5 checksum of the loaded models, which requires the model to be loaded. It also ignores other parameters of the vocabulary such as extra_ids, reverse_extra_ids, etc.