LangChain in AI Beta Hangs sometimes / cannot be stopped via UI

amenk commented 10 months ago

Describe the bug

When using LangChain on the AI Beta the workflow hangs, does not finish and also cannot be stopped

To Reproduce No always reproducible, but more often than not

Start workflow
Chain does not run through
Click stop workflow
All hangs, see the following GIF:

n8n-llm all hangs

Expected behavior

All works, or at least an error appears if something is wrong with the ChatGPT API / and the workflow can be stopped

Environment (please complete the following information):

docker run -it --rm --name n8nai -p 5678:5678 -e N8N_LOG_LEVEL=debug -e N8N_LOG_OUTPUT=console -v ~/.n8nai:/home/node/.n8n docker.n8n.io/n8nio/n8n:ai-beta

OS: 22.04
n8n Version Image: docker.n8n.io/n8nio/n8n ai-beta f38ede559f01 5 days ago 754MB
Node.js Version docker
Database system docker
Operation mode ?

Additional context

On the console I see:

2023-11-09T09:39:00.614Z [Rudder] debug: in flush
2023-11-09T09:39:00.615Z [Rudder] debug: cancelling existing timer...
2023-11-09T09:39:00.615Z [Rudder] debug: cancelling existing flushTimer...
2023-11-09T09:39:00.615Z [Rudder] debug: batch size is 3
2023-11-09T09:39:20.865Z [Rudder] debug: in flush
2023-11-09T09:39:20.865Z [Rudder] debug: cancelling existing timer...
2023-11-09T09:39:20.865Z [Rudder] debug: queue is empty, nothing to flush
2023-11-09T09:39:37.470Z | debug    | Wait tracker querying database for waiting executions "{ file: 'WaitTracker.js', function: 'getWaitingExecutions' }"
2023-11-09T09:40:37.471Z | debug    | Wait tracker querying database for waiting executions "{ file: 'WaitTracker.js', function: 'getWaitingExecutions' }"
2023-11-09T09:40:51.250Z | debug    | Proxying request to axios "{ file: 'LoggerProxy.js', function: 'exports.debug' }"
2023-11-09T09:41:37.473Z | debug    | Wait tracker querying database for waiting executions "{ file: 'WaitTracker.js', function: 'getWaitingExecutions' }"
2023-11-09T09:42:37.475Z | debug    | Wait tracker querying database for waiting executions "{ file: 'WaitTracker.js', function: 'getWaitingExecutions' }"

After pressing Ctrl+C

2023-11-09T09:44:09.902Z [Rudder] debug: batch size is 1
Waiting for 1 active executions to finish...
 - Execution ID 393, workflow ID: enkm09FeKlhfSAlB
Waiting for 1 active executions to finish...
 - Execution ID 393, workflow ID: enkm09FeKlhfSAlB

amenk commented 10 months ago

PS: I got an error from the pipeline for this issue: https://github.com/n8n-io/n8n/actions/runs/6810063907

OlegIvaniv commented 10 months ago

@amenk Is it possible you're hitting TPM/RPM/RPD rate limits for your OpenAI org? Because model will retry 6 times with an exponential backoff between each attempt. Which could seem like it's hanging but it's waiting to re-try. I'll add a configuration option in the model for this, so you'd be able to disable/lower retries.

amenk commented 10 months ago

@OlegIvaniv Yes I also assume that there is some limit hit at OpenAI.

where can I see this?
Still the workflow should be stoppable without killing the container?

OlegIvaniv commented 10 months ago

@amenk You can check your usage in OpenAI usage dashboard

amenk commented 10 months ago

@OlegIvaniv Yeah, sure. But I believe I don't see the current rate limits there. Was more thinking about the response headers. Does not seem even to be displayed with the "debug" log level.

Also it was my first run today, so I should not have hit limits. But it's all guessing ;-) On the second run it worked (after restarting the docker container)--- but I saw similar issues during the last days sporadically.

The main reason I opened the issue is because of the hard-hang, I believe that should never happen, even if an external API is limiting?

amenk commented 10 months ago

I found a new hint in the log

2023-11-10T11:54:28.721Z | error    | WorkflowOperationError: Only running or waiting executions can be stopped and 408 is currently crashed. "{ file: 'LoggerProxy.js', function: 'exports.error' }"

How to handle such crashed executions?

Is this considered a bug? Are there any workarounds? Can the exponential back-off somehow be limited? There is retry-on-fail configuration, but it's also happening when this is off.

Is any other log existing which could help here? I think "debug" is already the highest loglevel?

n8n-io / n8n

LangChain in AI Beta Hangs sometimes / cannot be stopped via UI #7666