patrickmaub / Function-Call-Extender

5 stars 0 forks source link

thoughts on the 4096 limits coming in #1

Open ShantanuNair opened 7 months ago

ShantanuNair commented 7 months ago

Hey, interesting repo. Unfortunately looks like openai's 3.5-turbo-1106 model limits output to 4096 tokens. As does the 4-turbo model. Claude has also changed every model's output to 4096 tokens.

OpenaAI's 0613 models will be deprecated in July this year. Azure as well. Soon we will have no major provider (of not OSS LLMs) providing >4096 generation tokens capabilities. IMO this is going to hit a lot of different use cases hard.

Have you had any thoughts on this?

patrickmaub commented 7 months ago

Hey there, appreciate your interest in the repo.

Yes, it was quite unfortunate that right around the time this article was written was when models were being forced to produce >4096 token outputs.

This inevitably means that we have to start making multiple API requests to generate longer form content. I haven't experimented around enough, but my general thought process is that we can take advantage of the long context window for input in order to produce long outputs.. For example, perhaps I am trying to write a 10-page paper. The first thing I might do is use ChatGPT to generate a list of writing tasks in order to achieve this (intro, first body paragraph, second body paragraph, ...). Then, I would start a fresh conversation and pass the tasks one by one. So, the first user prompt in the conversation will be "write the introduction". The second user prompt is, "Pick up where you left off with the introduction, and write the first body paragraph.." etc. Then we can effectively write a very long output by merging the assistant responses.

The reason my thought process is veered away from function calling is that I noticed that the newer function calling variations (on the 1106 models) seem to be more specifically fine-tuned for the narrow use case of essentially making API calls. The models behave quite differently with function calling enabled. If you particularly enjoyed the idea of having fully JSON structured inputs and outputs, I would perhaps recommend you using json mode on the API.

Let me know what you think, or if you are wondering my thoughts in the context of a specific use case, I would be happy to go more into depth.

Thanks

On Fri, Jan 19, 2024 at 2:38 AM Shantanu Nair @.***> wrote:

Hey, interesting repo. Unfortunately looks like openai's 3.5-turbo-1106 model limits output to 4096 tokens. As does the 4-turbo model. Claude has also changed every model's output to 4096 tokens.

OpenaAI's 0613 models will be deprecated in July this year. Azure as well. Soon we will have no major provider (of not OSS LLMs) providing >4096 generation tokens capabilities. IMO this is going to hit a lot of different use cases hard.

Have you had any thoughts on this?

— Reply to this email directly, view it on GitHub https://github.com/patrickmaub/Function-Call-Extender/issues/1, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXOND4FKQVAQPVONRC65CLLYPIPHVAVCNFSM6AAAAABCBQJEVSVHI2DSMVQWIX3LMV43ASLTON2WKOZSGA4DSOBUGA4DCNA . You are receiving this because you are subscribed to this thread.Message ID: @.***>