tryAGI / Tiktoken

This project implements token calculation for OpenAI's gpt-4 and gpt-3.5-turbo model, specifically using `cl100k_base` encoding.
https://github.com/openai/tiktoken
MIT License
49 stars 2 forks source link

Improve memory usage of `ModelToEncoder` #48

Open mikethea1 opened 1 month ago

mikethea1 commented 1 month ago

Thanks for creating and maintaining this great library!

What would you like to be added:

Today, ModelToEncoder calls ModelToEncoding which statically initializes a dictionary of 7 encodings. 6/7 are duplicates.

When each encoding is constructed, the constructor eagerly loads a bunch of data from manifest resources. As far as I can tell, this data gets loaded separately for each instance.

I would like to be able to call ModelToEncoder and only have it lazily load the encoding I care about. Furthermore, I'd like to see it share Encoding instances among models which map to the same encoding.

An example implementation might look like this:

public static Encoding? TryFor(string modelName)
{
    switch (modelName)
    {
        case "gpt-4o":
            return O200KCache.Instance;
        case "gpt-4":
        ...
        case "text-embedding-3-large":
            return Cl100KCache.Instance;
        default:
            return null;
    }
}

private static class O200KCache
{
    public static readonly O200KBase Instance = new();
}

private static class Cl100KCache
{
    public static readonly Cl100KBase Instance = new();
}

Why is this needed:

Reduce memory footprint and startup time, especially as more models are added.

Anything else we need to know?

I'd be happy to file a PR for this if you're interested!