openai-php / client

⚡️ OpenAI PHP is a supercharged community-maintained PHP API client that allows you to interact with OpenAI API.
MIT License
4.92k stars 508 forks source link

OpenAI API 4K tokens limitation #143

Closed ronlinet closed 1 year ago

ronlinet commented 1 year ago

I am looking for a PHP solution to split a >10k tokens text into <4k tokens blocks.

The general “hack” for this case is :

  1. Splitting this big chunk of text into semantically coherent sections,
  2. Running these text blocks as individual queries.
  3. Merging all generated outputs into one single text.

All demos which I have seen are written in Python. Is any such a solution available in this framework ?
Do you have anything similar on your road map ?

pb30 commented 1 year ago

Nothing in this library, since it is just a mirror of the OpenAI API and they have no APIs for this sort of thing.

I haven't ran across any PHP tokenizers either (although haven't looked hard). Depending on your general use case, a poor-mans version of str_split($str, 1000) may work (going on assumption that a token is ~4 characters, may need to decrease a bit).

ronlinet commented 1 year ago

Splitting text in to the blocks is not an issue. The problem is to keep the sections semantically coherent. Half broken semantic will generate confusing summaries. I will must go with NLP python libraries as I am not aware of any PHP equivalent for this case.