triton-inference-server / fastertransformer_backend

BSD 3-Clause "New" or "Revised" License
411 stars 133 forks source link

gptj: fix abysmally slow postprocessor performance; don't read a file for each new batch #99

Closed git-bruh closed 1 year ago

git-bruh commented 1 year ago

This wasn't helped by the fact that the "streaming" streams repeat the whole of the previous stream along with the new tokens...