Parameterize lattice node allocator size to optimize chunk allocation performance

google / sentencepiece

Unsupervised text tokenizer for Neural Network-based text generation.

Apache License 2.0

10.07k stars 1.16k forks source link

Parameterize lattice node allocator size to optimize chunk allocation performance #1022

Closed PriyankaRanganath closed 3 months ago

PriyankaRanganath commented 3 months ago

Lattice nodeallocator allocates chunks of 1024 items, which is 49 152 bytes total. This huge size of chunk causes performance issues in high qps environments. Our internal tests shows significant latency reduction is we can parametrize default chunk size for your needs.

google-cla[bot] commented 3 months ago

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

PriyankaRanganath commented 3 months ago

hi @taku910 ,

Is it possible to review this PR? Thanks, P

taku910 commented 3 months ago

We are afraid that it cannot be merged because it is only effective in special sampling situations. The lattice is not used in normal encode function. We will not be able to manage this new feature.

We cannot guarantee that this function is merged, but it would be appreciated if you first create an issue to broadly discuss the performance, implementation, and the importance of this feature.