mudler / LocalAI

:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed inference
https://localai.io
MIT License
24.47k stars 1.87k forks source link

Tokenization endpoint #1649

Open benniekiss opened 9 months ago

benniekiss commented 9 months ago

Is your feature request related to a problem? Please describe.

For generative models, many are limited by a maximum number of tokens. in some workflows, the prompts are generated dynamically to use as much context as possible by tokenizing the responses first to ensure that they will fit in the context.

Currently, this requires a local tokenization scheme which limits a complete API workflow.

Describe the solution you'd like

backends like transformers and llama.cpp both offer tokenization methods that just tokenize text without generating a response. Attaching these methods to a tokenization api endpoint would be helpful in removing local processing requirements.

Describe alternatives you've considered

Additional context

mudler commented 9 months ago

good point, it should be relatively easy indeed to expose