simonw / ttok

Count and truncate text based on tokens
Apache License 2.0
247 stars 7 forks source link

Truncate => trim tokens #6

Open jefftriplett opened 1 year ago

jefftriplett commented 1 year ago

My original issue:

I'm not sure if this is out of scope, but a common pattern I have seen and played with involves tracking the last x-tokens of a large piece of text and not just the first x-characters. Why is this useful? If you are writing a story bot or code review bot, you might want to provide extra context so that OpenAI can continue working on a problem with the next prompt.

Update: After giving this more thought, I think what's missing is some idea of trimming from the top or bottom versus truncating. For example, if I'm working on a story, then I might want the trim a bio from the first 400 tokens. But if I'm generating a story then I might want the last 1k tokens to keep the context of the story. Not sure if this sounds useful, but it's a use case that I see myself landing on.

Trim values could be [None, top, bottom] and cover all three use cases.