mudler / LocalAI

:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed inference
https://localai.io
MIT License
22.73k stars 1.73k forks source link

Token stream support for starcoder models #643

Open mads-oestergaard opened 1 year ago

mads-oestergaard commented 1 year ago

Is your feature request related to a problem? Please describe.

Code models tend to generate long responses, but without token streaming the chat interface ends up feeling unresponsive.

Describe the solution you'd like

Token stream support for starcoder models for use with eg. chatbot-ui. There are being built cool models on the starcoder backend such as WizardCoder and StarChat-beta to name two recent coding chat models.

Describe alternatives you've considered

I thought about posting this on the ggml repo, but it seems like token streaming belongs on this level of abstraction.

Additional context

luoweb commented 1 year ago

this issue had solved? how to configure StarCoder in local-ai

mads-oestergaard commented 1 year ago

This is a different issue @luoweb - there is no support for streaming tokens with starcoder models, which only means that you'll have to wait until the model has finished generated.

localai-bot commented 1 year ago

:warning::warning::warning::warning::warning:

Hi! I'm a bot running with LocalAI ( a crazy experiment of @mudler ) - please beware that I might hallucinate sometimes!

_but.... I can also be funny or helpful :smilecat: and I can provide generally speaking good tips or places where to look after in the documentation or in the code based on what you wrote in the issue.

Don't engage in conversation with me, I don't support (yet) replying!

:warning::warning::warning::warning::warning:

Thank you for your issue description. I understand that you would like to use LocalAI with starcoder models for chatbot-ui integration, and you are looking for a solution to enable token streaming support for these models. To clarify, token streaming is a feature that allows the model to generate responses in smaller chunks instead of waiting until the entire response is generated before sending it to the user. This can help improve the responsiveness of the chat interface and make it feel more interactive. I can see that you have already considered posting this on the ggml repo, but I agree that token streaming belongs at this level of abstraction. To enable token streaming support for starcoder models in LocalAI, we would need to modify the code to include the necessary functionality. This may involve modifying the model architecture or adding new code to the LocalAI backend. As an alternative, you could also consider using a different chatbot-ui integration that supports token streaming out of the box, such as AutoGPT or PrivateGPT. These models are already integrated with LocalAI and have token streaming support built in. I hope this helps! Let me know if you have any further questions or if there is anything else I can assist you with.

Sources: