pkoukk / tiktoken-go

go version of tiktoken
MIT License
601 stars 67 forks source link

OOM server error #34

Closed qhenkart closed 11 months ago

qhenkart commented 11 months ago

I have a basic usecase counter, however because GetEncoding is so expensive, my server with 50M of memory gets an OOM error immediately with only about 7 go routines calling it at the same time. It would be great if this was optimized or if the tkm could be shared and re-used

func countTokens(messages []openai.ChatCompletionMessage) int {
    tkm, err := tiktoken.GetEncoding(tiktoken.MODEL_CL100K_BASE)
    if err != nil {
        panic(err)
    }

    tokensPerMessage := 3
    var tokenCount int
    for _, message := range messages {
        tokenCount += tokensPerMessage
        tokenCount += len(tkm.Encode(message.Content, nil, nil))
        tokenCount += len(tkm.Encode(message.Role, nil, nil))
    }
    tokenCount += tokensPerMessage // every reply is primed with <|start|>assistant<|message|>

    return tokenCount
}
pkoukk commented 11 months ago

In order to allow multiple Encodings to be initialized concurrently, I did not lock getEncoding() but only used sync.Once to ensure the same instance is returned for the same Encoding name. Now it seems that getEncoding() is a very expensive operation and perhaps should not be allowed to execute concurrently, as it will allocate too much memory during execution.

qhenkart commented 11 months ago

@pkoukk thank you so much for the quick response and enhancement. Very cool.

I just want to clarify that with this update, I can generate a single encoding in the main.go file during server initialization and pass it to the handler so that requests and threads created within the request can reuse the same encoding maps concurrently?

pkoukk commented 11 months ago

Sure, of course you can. You also can initialization tiktoken in go routines. Now 10 threads only need allocate about 10M of space, which is 1/10 of the original.