Closed ronlinet closed 1 year ago
Nothing in this library, since it is just a mirror of the OpenAI API and they have no APIs for this sort of thing.
I haven't ran across any PHP tokenizers either (although haven't looked hard). Depending on your general use case, a poor-mans version of str_split($str, 1000)
may work (going on assumption that a token is ~4 characters, may need to decrease a bit).
Splitting text in to the blocks is not an issue. The problem is to keep the sections semantically coherent. Half broken semantic will generate confusing summaries. I will must go with NLP python libraries as I am not aware of any PHP equivalent for this case.
I am looking for a PHP solution to split a >10k tokens text into <4k tokens blocks.
The general “hack” for this case is :
All demos which I have seen are written in Python. Is any such a solution available in this framework ?
Do you have anything similar on your road map ?