Implement a way to abort / cancel request

dzmitry-kankalovich commented 1 year ago

Feature request

I've been playing around with OpenAI GPT-4 and ran into situation when response generation might take quite some time - say 5 minutes.

I switched over to streaming, but often I can immediately see the response is not what want, and therefore I'd like to cancel request.

Now here is the part that is unclear to me: is there an official way to cancel request in Python's version of LangChain? I have found this described in JS/TS version of the framework, however scanning docs, sources and issues yields nothing for this repo.

For now I simply terminate process, which works good enough for something like Jupyter notebooks, but quite problematic for say web application.

Besides termination, it's also unclear if I may incur any unwanted costs or not for the abandoned request.

Should some sort of feature parity be made with JS LangChain?

Motivation

Provide a documented way to cancel long-running requests

Your contribution

At this point I have capacity only to test out potential implementation. May work on the implementation in later weeks.

grp06 commented 1 year ago

did you find a way to do this?

dzmitry-kankalovich commented 1 year ago

Nope. What I did to workaround it:

I've added real-time output of streamed data to the page - gives nice UX, at least you see response being generated instead of waiting at a blank page
I've restructured my prompts to be faster
Got access to the company's AI proxy, which works with Azure's OpenAI version - and it is apparently like twice faster.

So the need to have this kinda less pressing now, but it still seems a must to have

dosubot[bot] commented 12 months ago

Hi, @dzmitry-kankalovich! I'm Dosu, and I'm here to help the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, you are requesting a way to cancel requests in the Python version of LangChain. Currently, you are terminating the process to cancel requests, but you mentioned that this is not ideal for web applications. There has been some discussion in the comments, with you mentioning workarounds you have implemented. However, you still believe having a documented way to cancel requests is important.

Before we close this issue, we wanted to check with you if this issue is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days.

Thank you for your understanding and contribution to the LangChain project!

dzmitry-kankalovich commented 12 months ago

dosu-beta

Seems to still be relevant.

I would be able to improve UX of my software significantly if I can added cancellation feature

dosubot[bot] commented 12 months ago

@baskaryan Could you please help @dzmitry-kankalovich with the cancellation feature they are requesting? They mentioned that it would significantly improve the UX of their software. Thank you!

mdfederici commented 11 months ago

I would also like to be able to cancel streams gracefully from the python client.

lmeyerov commented 10 months ago

Is there any update on the right patterns to cancel chains?

This is an architectural need at multiple levels:

When langchain is managing an LLM stream, or running a long HTTP Request call, this impacts the $ & quota by the consumer . Imagine a user erroneously realizing they're doing a massive call and wanting to cancel.
When there is a chain sequencing a bunch of RAG etc, being able to top the sequence becomes important, imagine running & stopping an autoGPT

Supporting everywhere at once is obviously not easy, but providing a pattern for the basics -- batch & streaming chain calls to openai -- seems like a clear start...

rokosamit commented 9 months ago

agree that this is critical to building a scalable system.

Leo310 commented 7 months ago

Any updates on this?

dosubot[bot] commented 7 months ago

🤖

Thank you for your feature request and for providing your workaround. As of now, the Python version of LangChain does not support request cancellation. However, there are no inherent technical limitations within the Python language or the LangChain framework that would prevent the implementation of a request cancellation feature.

In Python, you can use the asyncio library to cancel tasks. In the context of the LangChain framework, you could use the aiohttp.ClientSession object's cancellation capabilities. However, the current implementation of the Requests class and its methods in the provided context does not include a cancellation feature. To implement such a feature, you would need to modify the methods to handle cancellation, possibly by using asyncio's Task.cancel() method or by implementing a timeout.

Please note that implementing a cancellation feature would require careful handling of exceptions and cleanup to ensure that resources are properly released when a request is cancelled. You can refer to the Requests class for more details.

As for the costs associated with abandoned requests, it would depend on the specific pricing model of the language model provider. It's recommended to check with the provider for accurate information.

Your suggestion for feature parity with the JS version of LangChain is valid and would be a valuable addition to the Python version. As you've mentioned that you may have the capacity to work on the implementation in the coming weeks, your contribution would be greatly appreciated. In the meantime, I'll make a note of this feature request for future reference.

Please feel free to reach out if you have any further questions or need additional assistance. Dosu

Sources

libs/community/langchain_community/utilities/requests.py

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

hadfield commented 6 months ago

I have a requirement for this as well, and am looking for an update to langchain to support this generally.

As a short term workaround I extended the client class: httpx.AsyncClient overriding the "send": async def send(self, request, *args, **kwargs)
and the response class: httpx.Response and used this client in places like:

model = ChatOpenAI(
            model_name=model_id,
            streaming=True,
            http_client=client,
            callbacks=[callback]
        )

and keep track of the pending requests within the client, exposing some methods to list the in-progress requests and interrupt them.

Just mentioning this here in case that helps someone waiting for the more thorough solution, or if someone has a better idea 😄

rokosamit commented 6 months ago

Can you post that as a gist so at least it could be adopted in the meantime?

From: Marc Hadfield @.> Sent: Sunday, February 11, 2024 5:32:35 PM To: langchain-ai/langchain @.> Cc: Amit Sudharshan @.>; Comment @.> Subject: Re: [langchain-ai/langchain] Implement a way to abort / cancel request (Issue #5340)

EXTERNAL SENDER: Verify sender before providing any information or opening links and attachments.

I have a requirement for this as well, and am looking for an update to langchain to support this generally.

As a short term workaround I extended the client class: httpx.AsyncClient overriding the "send": async def send(self, request, *args, **kwargs) and the response class: httpx.Response and used this client in places like:

model = ChatOpenAI( model_name=model_id, streaming=True, http_client=client, callbacks=[callback] )

and keep track of the pending requests within the client, exposing some methods to list the in-progress requests and interrupt them.

Just mentioning this here in case that helps someone waiting for the more thorough solution, or if someone has a better idea 😄

— Reply to this email directly, view it on GitHubhttps://github.com/langchain-ai/langchain/issues/5340#issuecomment-1937817344, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A7AF7YFPUW5QRBRMPUBEWNLYTD6DHAVCNFSM6AAAAAAYREO5SCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZXHAYTOMZUGQ. You are receiving this because you commented.Message ID: @.***>

This e-mail and any files transmitted with it are intended only for the person(s) or entity to which it is addressed and may contain confidential, proprietary, copyrighted and/or privileged material. Any unauthorised review, retransmission, dissemination or other use of this information by persons or entities other than the intended recipient is prohibited. If the reader of this e-mail is not the intended recipient or the employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any use, dissemination, forwarding, printing or copying of this email is strictly prohibited.

Rokos Capital Management (including its group companies such as Rokos Capital Management LLP (“RCM LLP”) and Rokos Capital Management (US) LP (“RCM US”))(“RCM”) is required by regulation to review and retain both outgoing and incoming e-mail and may be required to produce e-mail records to regulatory authorities or others with legal rights to the information. Internet communications cannot be guaranteed to be secure or error free as information could be intercepted, corrupted, lost, arrive late or contain viruses. The sender does not accept liability for any errors or omissions in the context of this message which arise as a result of internet transmission.

This communication is for informational purposes only. This communication is neither an offer to sell nor a solicitation of an offer to buy any security or other investment product, nor should it be construed as investment advice unless explicitly stated as such. This communication is not an official confirmation of any transaction. All market prices, data and other information are not warranted as to completeness or accuracy and are subject to change without notice. Any comments or statements made herein do not necessarily reflect the views of RCM. Details of how RCM collects, uses and processes personal data of job applicants and prospective investors is available on our website at: Privacy Noticehttps://www.rokoscapital.com/Download.aspx?ID=6c24aff5-7.pdf&Name=Privacy-Notice.pdf. If you are applying for a role at RCM (or any of its affiliates) or are a prospective investor in any RCM managed fund or vehicle, please take the time to read and understand this privacy notice.

RCM LLP is a limited liability partnership authorised and regulated by the Financial Conduct Authority of the United Kingdom (reg. no. 679721) and registered in England & Wales, having its principal office at 23 Savile Row, London, W1S 2ET. RCM LLP is also registered as an investment adviser with the U.S. Securities and Exchange Commission (the "SEC").

RCM US is a limited partnership registered as an investment adviser with the SEC, having its principal office at 600 Lexington Avenue, New York, New York, 10022.

Registration as an investment adviser does not imply a certain level of skill or training, nor does it constitute an endorsement by the SEC.

snopoke commented 6 months ago

I think this should be straightforward if the stream context manager was used. The context manager will close the HTTP connection upon exit which will happen if the generator is terminated e.g. using a break in a for loop.

Instead of doing this in ChatOpenAi:

for chunk in self.client.create(messages=message_dicts, **params):
    yield chunk

Do this:

with self.client.create(messages=message_dicts, **params) as response:
    for chunk in response:
        yield chunk

gyulavoros commented 5 months ago

Showing interest as well. I have not much to add beyond what already has been discussed here.

Canceling streaming requests is basic infrastructure to build production systems. Generating tokens that never been used is a waste of compute resources and money.

AbortController is present in the JS/TS SDK for a while, having parity would be nice.

@snopoke is there anything I can help with, so your PR gets merged?

dzmitry-kankalovich commented 5 months ago

@snopoke would you kindly provide an example how your change would allow to abort request?

I was thinking that wrapping ainvoke() in asyncio.Task and then cancelling the task might do the trick, but I am not sure.

snopoke commented 5 months ago

With either the sync or async methods you can just stop iterating over the stream in order to cancel the generation:

for chunk in model.stream(message):
    if _abort_stream(chunk):
        break
    print(chunk.content)

This works because it exits the internal context manager which closes the HTTP connection to the API effectively aborting the stream.

brendanator commented 4 months ago

This would be a great feature.

It'd be really nice if there was a is_cancelled key in RunnableConfig that each chain implementation could check to see if it should cancel. This could be called like:

import threading

# Create the cancellation token
cancellation_token = threading.Event()

# Timeout after 60 seconds (or it could be cancelled by a user action)
threading.Timer(60.0, cancellation_token.set)

# Run the chain with the cancellation token
chain.invoke(input, {"is_cancelled": cancellation_token.is_set})

snopoke commented 4 months ago

@brendanator

This PR to make this possible (https://github.com/langchain-ai/langchain/pull/18013) was released in v0.1.15.

I don't think there is any need to tell the chain to cancel the stream since interrupting the loop before completion already does that.

pechaut78 commented 4 months ago

@brendanator

This PR to make this possible (#18013) was released in v0.1.15.

I don't think there is any need to tell the chain to cancel the stream since interrupting the loop before completion already does that.

Sorry but if I use:

const stream = await remoteChain.stream("What is your name ?");
    let counter = 0;
    for await (const chunk of stream) {
        counter++;
        console.log(chunk);
        if (counter > 1) { // or any condition you want to stop the generation

        return;
        }
    }

It will not return until the stream is completed, and I can not remove any await.

brendanator commented 4 months ago

This PR to make this possible (#18013) was released in v0.1.15.

I don't think there is any need to tell the chain to cancel the stream since interrupting the loop before completion already does that.

@snopoke Although this is a nice solution it is incomplete. It only works for streaming requests and I assume it only works for ChatOpenAI

I was proposing a general solution that can be used by any chain and in any of the Runnable interface methods, not just astream/stream

Now that I look closer at the JS cancellation docs I see that they have the same interface I proposed except using the name keyword signal

snopoke commented 4 months ago

@brendanator you example above seems to be JavaScript. In python you could do:

response = await remoteChain.stream("What is your name ?")
async for chunk in response:
    print(chunk)
    if condition():
        break

I was proposing a general solution that can be used by any chain and in any of the Runnable interface methods, not just astream/stream

What is the goal here? Is it to cancel the request to prevent excess token usage? If so I don't think that makes sense outside of the stream APIs unless the API you're calling can detect the closed connection and abort the generation (which I think is unlikely).

Now that I look closer at the JS cancellation docs I see that they have the same interface I proposed except using the name keyword signal

This exists because the Javascript fetch api supports aborting requests. I'm not sure what the best way to support this in Python.

hadfield commented 4 months ago

The goal as described at the top of this issue is to halt processing a request (whether in a LLM model invocation, chain, or otherwise) perhaps in response to a user hitting a "stop" button in a UI.

The main problem I found was that there was no way to proactively tell langchain to interrupt a call to a model, such as calling the OpenAI API (either a synchronous call or asynchronous call). This is the case mentioned at the top of this issue as the call may take a while.

The solution that I found was to pass in a http_client when instantiating the LLM Model which tracks requests made with the client and is interruptible. Then when an interrupt is needed the low level http client request is interrupted which triggers an exception which hopefully would be caught by langchain to clean things up, equivalent to if the http_client was interrupted initiated by the server-side.

This just covers the one scenario where the LLM Model call is the thing that needs to be interrupted to halt processing, and is not a more general solution that would cover other cases, for instance tool calls being interrupted.

Vert53 commented 4 months ago

Found a way to do this by canceling the asyncio.Task. Do I assume correctly that token generation would stop and so we will not be charged for unused output tokens?

langchain-ai / langchain