yaroslavyaroslav / OpenAI-sublime-text

First class Sublime Text AI assistant with GPT-4o and llama.cpp support!
MIT License
127 stars 11 forks source link

max_tokens is not enforced correctly #8

Closed paschembri closed 2 months ago

paschembri commented 1 year ago

Token count is currently based on the length of the selection (if (self.settings.get("max_tokens") + len(self.text)) > 4000:).

However, OpenAI models are using GPT2Tokenizer which leads to a more accurate token count.

As of now I didn't find an easy way to include 3rd-party deps for a sublime package (the python package transformers should be used).

What do you think about adding a setting enabling the user to override the 4000 limit ? (Happy to provide a PR).

yaroslavyaroslav commented 1 year ago

Thank you for your proposal. There's an easy way to add a 3rd party dependency actually.

It should be added this way. Unlikely I'd found that only after I did implement it within stdlib boundaries.

Anyway, I'd rather prefer to not add transformers package if it's more than let's say 10mb with all its dependencies. This plugin is pretty light and straightforward so far, and I'd like to keep it that way.

So here's my proposal towards to you: if the transformers package are not so big one -- you are very welcome to use in in your PR, and if it's not please consider to made tokens count less precise p, but without such package.

paschembri commented 1 year ago

I’ll look into it but I know the transformers package does magic stuff (like downloading vocab files…)

I think extracting a basic rule may be simpler…

thanks for the link on how to add deps - will be useful…

yaroslavyaroslav commented 1 year ago

I've found the library that is suggested for counting tokens locally. Have to check it's size before include it in project anyway.

https://github.com/openai/tiktoken

yaroslavyaroslav commented 1 year ago

https://github.com/wbond/package_control_channel/blob/master/repository/dependencies.json

It appears, that only packages from within this list are available for being dependencies.

paschembri commented 11 months ago

I managed to get a tiktoken server running locally (this is overkill but fits my need)

Small plugin here : https://github.com/paschembri/sublime-plugin-maker/tree/main/examples/Tiktoken-Counter Tiktoken server here : https://github.com/paschembri/tiktoken-server

yaroslavyaroslav commented 11 months ago

@paschembri believe it could be toggled feature. Like if a user wants it so hard to install smth in addition — he or she could have it, but by destiny it would be disabled.

And just curious, why do you need it that much? Cold you please elaborate your flow?

paschembri commented 11 months ago
  1. Select a text range in sublime (to be used as a prompt)
  2. launch request to OpenAI completion endpoint

The issue arises from not knowing the prompt size before launching a request.

So being able to know the token count from the text selection is key

yaroslavyaroslav commented 11 months ago

@paschembri am I get you right, that you're mostly working with an edged scale text corpuses, like the one that's pretty close to the upper bound of a model?

I mean, from my side I'm hitting such limit of a single prompt close to never on both gpt-4 and got-turbo-32k even though I'm using pretty big code chunks to passing in in my daily workflow. So I still trying to find the answer, why does such accuracy is necessary? Even taking in account an oncoming error, it shouldn't be that large in a scale under 10k tokens to affects user flow in a dramatic way, if we'd just presume that each token ~ 4 chars. So what do I missing?

P.S: I'm not arguing in any way here. Just trying to dig deep enough in your case.

paschembri commented 11 months ago

I am using different models (3.5-turbo, 3.5-turbo-16k and 4) but I don't have access to the gpt-4-32k and I find it useful to compare outputs from different models.

According to the prompt and the use case, there are 2 edge cases that I often encounter:

Now, you're totally right with we'd just presume that each token ~ 4 chars

yaroslavyaroslav commented 11 months ago

large prompt => small expected output : this is the worst case scenario where I want to be able to tell how much token is the prompt

In the very last release (2.1.0) this case is covered in a straightforward way: if the sum of prompt and max_token values is exceeds a model limit the OpenAI gateway throws an error with the exact numbers of both exact prompt tokens amount that have been passed and max_tokens. Before the last release i've just silenced that message, but now I passthrough to a user by sublime error message popup and the sublime logs printout. Have you tried that already, does it met your needs?

Anyway I sill believe that it's a good feature to count it at least in some approximate way local, and maybe it's worth a shot to add your even more precise solution as an optional one. But before doing latter I have to figure out its use cases to design a proper solution.

yaroslavyaroslav commented 7 months ago

https://github.com/yaroslavyaroslav/OpenAI-sublime-text/issues/29#issuecomment-1837598655 This news affects this issue as well.