n8n-io / n8n

Free and source-available fair-code licensed workflow automation tool. Easily automate tasks across different services.
https://n8n.io
Other
50.11k stars 8.12k forks source link

Duplicate Executions with No Active Execution Found error #11861

Open nikozila opened 5 days ago

nikozila commented 5 days ago

Bug Description

I’ve encountered a potential issue where some executions are being executed more than once with the error of “No Active Execution Found”. Check out the logs:

debug | Resuming execution 6692658 {"scope":"executions","executionId":"6692658","file":"wait-tracker.js","function":"startExecution"} debug | Found 3 executions. Setting timer for IDs: x, 6692658, y {"scope":"executions","file":"wait-tracker.js"... debug | Resuming execution 6692658 {"scope":"executions","executionId":"6692658","file":"wait-tracker.js","function":"startExecution"} debug | Execution added {"executionId":"6692658","file":"active-executions.js","function":"add"} debug | Execution for workflow [name] id 6692658 debug | Execution added {"executionId":"6692658","file":"active-executions.js","function":"add"} debug | Execution ID 6692658 had Execution data. Running with payload. {"executionId":"6692658","file":"workflow-runner.js","function":"runMainProcess", debug | ActiveExecutions attachWorkflowExecution() called | Execution id 6692658 debug | ActiveExections getExecution() called | Execution ID: 6692658, Execution: [object Object] debug | ActiveExections getExecution() called | Execution ID: 6692658 debug | Execution for workflow [name] was assigned id 6692658 debug | Execution ID 6692658 had Execution data. Running with payload. debug | ActiveExecutions attachWorkflowExecution() called | Execution id 6692658 debug | ActiveExections getExecution() called | Execution ID: 6692658, Execution: [object Object] debug | ActiveExections getExecution() called | Execution ID: 6692658, Execution: [object Object] debug | Setting execution status for 6692658 to "waiting" {"file":"workflow-execute-additional-data.js","function":"setExecutionStatus"} debug | ActiveExecutions setStatus() called | Execution id 6692658 debug | ActiveExections getExecution() called | Execution ID: 6692658, Execution: [object Object] debug | Save execution data to database for execution ID 6692658 debug | ActiveExecutions getStatus() called | Execution id 6692658 debug | ActiveExections getExecution() called | Execution ID: 6692658, Execution: [object Object] debug| ActiveExections getExecution() called | Execution ID: 6692658, Execution: [object Object] debug | Execution finalized {"executionId":"6692658","file":"active-executions.js","function":"finalizeExecution"} debug | Execution removed {"executionId":"6692658","file":"active-executions.js"}

As you can see we have two “Resuming execution 6692658” logs. That’s why some of the execution functions get called more than once. Eventually one of the executions got finalized and removed from the ActiveExecutions list. But here is fun part:

debug | Setting execution status for 6692658 to "waiting" {"file":"workflow-execute-additional-data.js","function":"setExecutionStatus"} debug | ActiveExecutions setStatus() called | Execution id 6692658 debug | ActiveExections getExecution() called | Execution ID: 6692658, Execution: undefined debug | Start external error workflow {"executionId":"6692658",...

Since the first execution is removed from the activeExections list, when n8n tries to call getExecution for the second execution, it throws error

debug | ActiveExections getExecution() called | Execution ID: 6692658, Execution: undefined The execution is undefined. (rather than [object object])

What I realized is that there might be a race condition between getWaitingExecutions, startExecution and workflowrRunner.run methods.

n8n doesn’t check if the execution is running now in waitTracker startExecution function nor in the workflowRunner.run and runMainProcess functions. I think there should be a checking in those functions too to guarantee that an execution is being run only once.

To Reproduce

It happens to a few of my workflows. Honestly I don't know how to reproduce this problem. But here is my case:

  1. Normally we have one execution in 2 seconds. Sometimes we have multiple executions running at the same time.
  2. We have some large workflows (with at least 50 nodes)
  3. These large workflows consist on many wait nodes.
  4. When n8n resumes these wait nodes in those large workflows, this problem occurs.

Expected behavior

You'll see a failed execution with "No Active Execution Found" error and when you open the execution you won's see any data.

Operating System

docker

n8n Version

1.63.4

Node.js Version

20.18.0

Database

PostgreSQL

Execution mode

main (default)

Joffcom commented 5 days ago

Hey @nikozila,

We have created an internal ticket to look into this which we will be tracking as "GHC-486"

nikozila commented 2 days ago

Hi, I found a new thing!

I put two logs in wait-tracker.ts:

  1. One log before the if statement
  2. One log after the if statement image

And I saw this (check the execution id #7027024):

2024-11-27T20:12:15.068Z | debug | wait-tracker - getWaitingExecutions - executionId: 7027024
2024-11-27T20:12:15.068Z | debug | wait-tracker - getWaitingExecutions - no #7027024 in waitingExecutions
2024-11-27T20:13:17.071Z | debug | Resuming execution 7027024
2024-11-27T20:13:17.080Z | debug | Found 4 executions. Setting timer for IDs: 2834745, 7027024, 7042123, 6993420
2024-11-27T20:13:17.081Z | debug | wait-tracker - getWaitingExecutions - executionId: 7027024
2024-11-27T20:13:17.081Z | debug | wait-tracker - getWaitingExecutions - no #7027024 in waitingExecutions
2024-11-27T20:13:17.082Z | debug | Resuming execution 7027024

It ended up with the error: No Active Execution Found because of two running executions at the same time.