sgl-project / sglang

SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.
Apache License 2.0
2.8k stars 180 forks source link

LLM integration with normal programming patterns or, a high level sglang interface #39

Open AriMKatz opened 5 months ago

AriMKatz commented 5 months ago

I posted a similar issue in outlines, but here goes: we're building something complex and I think it would be helpful to have a marvin-like library that supports normal programming patterns with LLM's but also gives control over generation. This would provide high level pythonic abstractions like typed functions dynamically compiling grammars for return pydantic structs that would also allow you to drop down to customize generation either within or around these functions. This could be like high level mypy typed boundaries around sglang programs.

Marvin and funcchain do the high level (sort of), but you give up control. Marvin relies on json and/or function calling and is constrained to OAI models, funcchain uses dynamically compiled Lllamacpp grammar as well.

Analogy would be Pytorch:triton::funcchain/equivalent:sglang

Aside from the funcchain-like feature, for my use case I'd love to see:

  1. Custom unpacking of pydantic structs: Looping/ programmatically accessing fields into prompts
  2. Customizing generation of pydantic output structs
  3. Mixing and matching regular python types and pydantic inputs and outputs
  4. Stretch goal: Some sort of single dispatch (class based) or multiple dispatch polymorphism (https://github.com/beartype/plum)
  5. Our baseline MVP will be using OpenAI models initially. For this to be computationally feasible, I think we'd need function calling, which seems to be planned?

Anyway, is this something that would align with your vision, or better to have a high level interface library with multiple backends?

DSPY does this in some sense, but it's constrained to a pytorch like programming model, where this is more like "differentiable swift" or the "Julia just write your code and backprop through it" vision.

One thing that funcchain wants to do is have an "autotune" model where these functions are kicked to dpsy for compilation. I can see sometimes I'd like more control and sometimes I'd be happy to have dspy do some of the work for me.

merrymercy commented 5 months ago

Great discussion! You essentially covered the entire spectrum of LLM programming interfaces, from high-level Pythonic abstractions to low-level control. Some aspects align quite well with our vision. Incorporating Pydantic-based input/output fields and function calls is on our roadmap.

Given our background and interests, we adopt a bottom-up approach. We begin by building an efficient inference engine, then develop a low-level prompting control language, and ultimately a high-level Pythonic interface. Stay tuned!

AriMKatz commented 5 months ago

Awesome! I'm really happy to hear that and am looking forward to seeing what you all come up with. That progression makes sense. (also I fixed a bunch of typos and formatting)

@merrymercy the blogpost says

"Given such an SGLang program, we can either execute it eagerly through an interpreter, or we can trace it as a dataflow graph and run it with a graph executor."

Is this for runtime speed or task performance tuning like/with dspy? I believe omar mentioned that dspy could be used as such a compiler, and langchain has the beginnings of that integration. The only issue with that is the dspy runtime doesn't support streaming (and personally we really need streaming!), but should dspy have its own runtime in principle?