sgl-project / sglang

SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.
Apache License 2.0
2.76k stars 179 forks source link

Development Roadmap #157

Open Ying1123 opened 4 months ago

Ying1123 commented 4 months ago

Function Calling

High-level Pythonic Interface

Inference Optimizations

Structured Decoding

Compiler

LoRA Support

Model Coverage

AriMKatz commented 4 months ago

Are there still plans for a high level pythonic interface? https://github.com/sgl-project/sglang/issues/39#issuecomment-1899351565

Ying1123 commented 4 months ago

Are there still plans for a high level pythonic interface? #39 (comment)

Hi @AriMKatz, thanks for the reference. This is very important, I just added it.

nivibilla commented 4 months ago

For the vision models support, is it possible to align with the openai gpt4v API? https://platform.openai.com/docs/guides/vision

aliencaocao commented 4 months ago

Are there plans for loading models in 8bit or 4bit?

Ying1123 commented 4 months ago

For the vision models support, is it possible to align with the openai gpt4v API? https://platform.openai.com/docs/guides/vision

@nivibilla Yes, it is already aligned with the openai gpt4v API, see here. You can also find a runnable example of serving it with Sky Serve here.

Ying1123 commented 4 months ago

Are there plans for loading models in 8bit or 4bit?

@aliencaocao Thanks for the question! The AWQ and GPTQ are already supported. But we do not support an automatic dtype translation yet. You are welcome to submit a PR for that.

aliencaocao commented 4 months ago

Are there plans for loading models in 8bit or 4bit?

@aliencaocao Thanks for the question! The AWQ and GPTQ are already supported. But we do not support an automatic dtype translation yet. You are welcome to submit a PR for that.

I'm looking to load llava 1.6 in 8bit, but it does not seem that llava series has AWQ or GPTQ quants, or did I miss out anything here?

EDIT: I saw 1.5 has but not 1.6 yet. Perhaps its just too new and no one did a calibration yet.

qeternity commented 3 months ago

Hi all - is anyone working on the S-LoRA integration currently? I see the branch, but it looks a few months old.

Would love to see this, happy to pick up from existing work or start fresh.

Ying1123 commented 3 months ago

Hi all - is anyone working on the S-LoRA integration currently? I see the branch, but it looks a few months old.

Would love to see this, happy to pick up from existing work or start fresh.

Hi @qeternity, I was working on it but have been blocked by other affairs. You are welcome to contribute, either continue on the branch or start fresh! I'll be happy to review and collaborate.

Bit0r commented 3 months ago

Tools support is very important, which is necessary for many use cases.

omri-sap commented 2 months ago

Is TinyLlama supported? TinyLlama/TinyLlama-1.1B-Chat-v1.0 generation seems a bit slow...

wille-x commented 1 month ago

I see llama.cpp integration is on the roadmap. When will this feature be delivered? It would be very nice to have it , since it will support running local LLMs, such as llama models, on Mac computers and experiment them with the powerful and expressive SGLang.

Gintasz commented 1 month ago

I'd request to include support for Phi-3-mini

binarycrayon commented 1 week ago

Hi all - is anyone working on the S-LoRA integration currently? I see the branch, but it looks a few months old. Would love to see this, happy to pick up from existing work or start fresh.

Hi @qeternity, I was working on it but have been blocked by other affairs. You are welcome to contribute, either continue on the branch or start fresh! I'll be happy to review and collaborate.

Hi which branch is it? looks like better start fresh