sutyum commented 9 months ago

Currently the compiled programs are not async and hence are not efficient to serve using a python server. It would be useful to merge the PRs aiming to add async across the dspy library.

This could also involve adding nurseries in order to await an ensemble of requests simultaneously.

okhat commented 9 months ago

Thanks @sutyum. Is the main target here serving queries in parallel?

Currently we do this with threading; DSPy is thread-safe. Does async offer additional benefits for you?

sutyum commented 9 months ago

Serving programs

Threads vs Asyncio

Given that LM programs spend most of their execution time waiting for responses from other machines. They are IO heavy rather than being compute heavy. Async IO tends to perform particularly better in scenarios where a large chunk of execution time is spent waiting. Rather than busy waiting the async executor can carry out other tasks in the meantime. There are also limits on how many OS threads can be created on a given CPU, which is far less than the requests one could serve if each request only initiated a green thread (asyncio).

Compilation vs Execution

Also worth considering the case of LM programs that run very long (hours, days, months) - such sort of scenarios would benefit from other forms of distributed execution. For instance, an agent orchestration project - SuperAGI uses a message broker to break apart the LM call DAG into a workflow with each call happening in a distributed manner.

Bring your own executor

It seems we are still on a look out for an flexible execution model for such compile programs. Just putting my thoughts here in order to continue discussion on this open question.

Do we need this? :

flowchart LR

D[Dspy program]  --> C[DAG of compiled programs]
C --> E[Bring your own executor]

One idea that was brought up recently was to compile with dspy and execute with a separate executor system (such as langchain). This sort of approach could be useful to keep dspy focused on LM programming primitives and constructs rather than the various choices one can make for execution.

CyrusOfEden commented 7 months ago

This sounds super cool, similar to #338 I think the broader question is "how does DSPy fit into the productionization workflow" and something we can think more about to come up with an elegant approach.

skrawcz commented 5 months ago

Posting here so I get notified of updates. I'd be interested in getting compilation to run on something like Hamilton.

sutyum commented 5 months ago

@CyrusOfEden Could sglang as the executor be all that we need?

CyrusOfEden commented 5 months ago

@sutyum how do you imagine that working?

skrawcz commented 5 months ago

@CyrusOfEden Could sglang as the executor be all that we need?

That doesn't sound all that useful to me.

Deploying a "compiled" dspy program to me requires publishing a graph comprised of the optimized prompts generated. Then you can take that and convert it into whatever framework you want.

williambrach commented 5 months ago

@CyrusOfEden Could sglang as the executor be all that we need?

That doesn't sound all that useful to me.

Deploying a "compiled" dspy program to me requires publishing a graph comprised of the optimized prompts generated. Then you can take that and convert it into whatever framework you want.

so you just take compiled prompts from dspy and run them via for example openai lib ?

skrawcz commented 5 months ago

@CyrusOfEden Could sglang as the executor be all that we need?

That doesn't sound all that useful to me. Deploying a "compiled" dspy program to me requires publishing a graph comprised of the optimized prompts generated. Then you can take that and convert it into whatever framework you want.

so you just take compiled prompts from dspy and run them via for example openai lib ?

As a first target that would be great!

jmanhype commented 4 months ago

so langchain is the way forward. also when we say a graph how about using this https://topoteretes.github.io/cognee/


               cognee
Deterministic LLMs Outputs for AI Engineers
Open-source framework for loading and structuring 
LLM context to create accurate and explainable AI 
solutions using knowledge graphs and vector stores

jmanhype commented 4 months ago

Serving programs

Threads vs Asyncio

Given that LM programs spend most of their execution time waiting for responses from other machines. They are IO heavy rather than being compute heavy. Async IO tends to perform particularly better in scenarios where a large chunk of execution time is spent waiting. Rather than busy waiting the async executor can carry out other tasks in the meantime. There are also limits on how many OS threads can be created on a given CPU, which is far less than the requests one could serve if each request only initiated a green thread (asyncio).

Compilation vs Execution

Also worth considering the case of LM programs that run very long (hours, days, months) - such sort of scenarios would benefit from other forms of distributed execution. For instance, an agent orchestration project - SuperAGI uses a message broker to break apart the LM call DAG into a workflow with each call happening in a distributed manner.

Bring your own executor

It seems we are still on a look out for an flexible execution model for such compile programs. Just putting my thoughts here in order to continue discussion on this open question.

Do we need this? :
flowchart LR

D[Dspy program]  --> C[DAG of compiled programs]
C --> E[Bring your own executor]
One idea that was brought up recently was to compile with dspy and execute with a separate executor system (such as langchain). This sort of approach could be useful to keep dspy focused on LM programming primitives and constructs rather than the various choices one can make for execution.

Here's how LangChainPredict and LangChainModule could be enhanced to support streaming and tracing:

Streaming Support

class LangChainPredict(Predict):
    def forward(self, **kwargs):
        stream_output = kwargs.pop("stream_output", False)

        if stream_output:
            # Pass streaming flag to LangChain
            output = self.langchain_llm.invoke(prompt, streaming=True)
            return StreamedPrediction(output, signature=signature)
        else:
            output = self.langchain_llm.invoke(prompt)
            return Prediction.from_completions(output, signature=signature)

Changes:

Add stream_output argument to forward()
Pass streaming=True to LangChain LLM when stream_output is set
Return StreamedPrediction instead of Prediction for streaming output

Tracing Support

class LangChainPredict(Predict):
    def forward(self, **kwargs):
        enable_tracing = kwargs.pop("enable_tracing", False)

        if enable_tracing:
            # Enable tracing in LangChain
            self.langchain_llm.set_tracing(True)

        output = self.langchain_llm.invoke(prompt)

        if enable_tracing:
            # Access and log the trace
            trace = self.langchain_llm.get_trace()
            logger.debug(f"LangChain Trace: {trace}")

        return Prediction.from_completions(output, signature=signature)

Changes:

Add enable_tracing argument to forward()
Enable tracing on LangChain LLM when enable_tracing is set
Retrieve and log the trace after invoking the LLM

The LangChainModule class can expose these same options and pass them through to its underlying LangChainPredict instances.

With these enhancements, DSPy programs using LangChain components will be able to leverage streaming and tracing capabilities, enabling better observability and interactivity in production deployments.

skrawcz commented 4 months ago

so langchain is the way forward

I think it's just 'a' way. not the way. ;)

mikeedjones commented 3 months ago

@CyrusOfEden Could sglang as the executor be all that we need?

That doesn't sound all that useful to me. Deploying a "compiled" dspy program to me requires publishing a graph comprised of the optimized prompts generated. Then you can take that and convert it into whatever framework you want.

so you just take compiled prompts from dspy and run them via for example openai lib ?

Dspy supports several other objects in its graphs which I think makes this a little more tricky. How do you encapsulate a retrieval model in the compiled prompts, for example?

sarora-roivant commented 1 month ago

Posting here so I get notified on updates! I would love if we could get compilation running on something like Dagster

CyrusOfEden commented 1 month ago

@sarora-roivant wanna send me a DM on LinkedIn? URL in bio

sarora-roivant commented 1 month ago

@sarora-roivant wanna send me a DM on LinkedIn? URL in bio Just sent you a connection request

stanfordnlp / dspy

Deployment of a compiled program #249

Serving programs

Threads vs Asyncio

Compilation vs Execution

Bring your own executor

Serving programs

Threads vs Asyncio

Compilation vs Execution

Bring your own executor

Streaming Support

Tracing Support