google / sentencepiece

Unsupervised text tokenizer for Neural Network-based text generation.
Apache License 2.0
10.25k stars 1.17k forks source link

Evaluate Profile-Guided Optimization (PGO) #961

Open zamazan4ik opened 10 months ago

zamazan4ik commented 10 months ago

Hi!

I evaluate Profile-Guided Optimization (PGO) applicability to different kinds of software - all my results are available in my repo. From my experience, PGO helps with achieving better performance in many scenarios.

Recently I performed PGO tests for HuggingFace Tokenizer project - the results are located here. Since the results are quite promising (up to 20% performance improvements in some scenarios), I think it could be interesting to perform the same PGO benchmarks for SentencePiece. As far as I understand from the technical highlights - performance is one of the goals of the project.

Did anyone try before to optimize SentencePiece performance with PGO? If yes, could you please share the benchmark results? If no - is there an established methodology/make command/anything else to perform such benchmarks?

Thanks in advance.