openai / tiktoken

tiktoken is a fast BPE tokeniser for use with OpenAI's models.
MIT License
12.31k stars 833 forks source link

Multithreading (Python) vs Parallelism (Rust) #53

Closed mert-kurttutan closed 1 year ago

mert-kurttutan commented 1 year ago

Hi, I just tried to test your comment on why you went with python threading option. Indeed, I got similar results to your comment.

------------------------------------------------------------------------------ benchmark: 2 tests ------------------------------------------------------------------------------
Name (time in s)         Min               Max              Mean            StdDev            Median               IQR            Outliers     OPS            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_python_batch     1.9514 (1.0)      2.3618 (1.0)      2.1631 (1.0)      0.1870 (1.0)      2.1691 (1.0)      0.3524 (1.29)          2;0  0.4623 (1.0)           5           1
test_rust_batch       2.5516 (1.31)     3.4357 (1.45)     2.7726 (1.28)     0.3729 (1.99)     2.6189 (1.21)     0.2722 (1.0)           1;1  0.3607 (0.78)          5           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Obtained with pytest-benchmark plugin

Fork: https://github.com/mert-kurttutan/tiktoken/tree/main

I am wondering why there is no performance gain with rust in comparison to python? Is this expected despite each encoding operation being a CPU heavy operation? Did you do any analysis/profiling on this issue?

hauntsaninja commented 1 year ago

It's because we drop the GIL. Let me flip the question — why would you expect a performance gain? The comment in source indicates that I did do profiling ;-)

mert-kurttutan commented 1 year ago

Ohh, I see thanks. I guess I thought releasing GIL would not allow multiple threads to run that efficiently.

hauntsaninja commented 1 year ago

Thanks for confirming the benchmark! Always happy to see people investigate potential performance improvements :-)