executions continually running

craigz commented 1 year ago

Describe the bug All executions are running since upgrading to 0.217.0. Every job run is displaying as still running. Clicking on View only displays a message stating Execution is running. It will show here once finished.

In the main Executions view the most recent job for example displays Running for 1.012s. Clicking View while showing the still running message, in the sidebar shows that its been running for 4:49m and counting upwards. Looking down the list, sure enough each job shows a longer and longer running time until the first job run after update which has now been running for 217:00m and counting. Jobs from prior to the upgrade all show as completed within just under or just over 1 second depending.

To Reproduce Steps to reproduce the behavior:

upgraded to the latest n8n (0.217.0)
launch n8n
view All executions
See error

Expected behavior Jobs to queue, launch & complete.

Environment (please complete the following information):

OS: [Docker Deskop for Mac 4.16.2 (95914)]
n8n Version [e.g 0.217.0]
Database system [SQLite]
Operation mode [own]

pluusla commented 1 year ago

Have the same misbehavior and can confirm the bug. This occurs since 0.217.0.

Environment: OS: Linux Docker (Raspbian) n8n version 0.217.0 Database system Postgres Operation mode own

csuermann commented 1 year ago

Thanks for reporting this. We'll look into it right away and are tracking it internally as N8N-6206

krynble commented 1 year ago

@pluusla and @craigz thanks for the report.

Could you provide us with some more details?

What type of trigger are you using?
Is this happening to all executions or just a few specific workflows?

pluusla commented 1 year ago

All my workflows use the cron trigger.

For me, all workflows are affected by it.

In the log (not debug-log) I also see the following error message: Error fetching feature flags [CanceledError: canceled] { code: 'ERR_CANCELED' }

But suspect that this has nothing to do with this problem.

Joffcom commented 1 year ago

Hey @pluusla,

If you create a new workflow does it have the same issue? Are you also able to share one of the workflows that has the issue?

pluusla commented 1 year ago

Hey @Joffcom

Created a test workflow with cron-trigger. This has worked.

I have attached the test workflow and a loop workflow. workflows.zip

flipswitchingmonkey commented 1 year ago

Hi @craigz and @pluusla, I'm trying to reproduce the issue here as well...

any chance you remember which version you were running before the upgrade?
do you feel comfortable looking into the database directly? if so, it would be very interesting to find out what's in the row for one of the forever-running executions (id = executionId) - in particular the 'status' column.

pluusla commented 1 year ago

@flipswitchingmonkey

Pretty sure I updated from 0.216.1 to 0.217.0.

Not exactly comfortable, I simply do not know enough about databases. ;) Maybe if you explain to me step by step how I get exactly to your desired information ;)

flipswitchingmonkey commented 1 year ago

No worries, I think we figured out the issue and you don't have to do anything really. As a first note: your executions ran successfully despite of what the status is saying, so there is no need to worry.

A small patch should also fix them retroactively. The issue here had to do with an.... interesting conflagration of the combination of node and a race condition between hooks. Basically something that was supposed to set the status correctly was overtaken by something that did not.

We will have a fix for you soon!

krynble commented 1 year ago

Hey good news. As said @flipswitchingmonkey we have built a custom docker image that you can use.

We'd like to get your feedback before releasing a new version.

You can find the tag here or pull it using docker pull n8nio/n8n:n8n-6206-executions-continually-running

This image is not available in arm64 so I'm afraid newer macs won't be able to run it.

Could you @pluusla and @craigz give it a shot and let us know so we can release a patch?

pluusla commented 1 year ago

@krynble

thx!

I still had a moment and wanted to try it. However, Portainer does not seem to find the defined image (latest tag works though). edit: Raspberry 4 needs a arm64 image (no matching manifest for linux/arm64/v8 in the manifest list entries)

I am not available for several hours as of now. Maybe @craigz can try it.

flipswitchingmonkey commented 1 year ago

The arm64 image should now be available as well

pluusla commented 1 year ago

@flipswitchingmonkey @krynble

Have now been able to try out the version. It doesn't seem to loop anymore. the workflows that didn't work before could now be executed without errors. Awesome, thank you!

craigz commented 1 year ago

seems i missed all the activity due to time zone differences. i was able to pull and run the arm64 version of n8n-6206-executions-continually-running however it started up without mounting my data disk, so there were no scripts (or configuration) loaded once it started.

i saw the push on github of the n8n@0.217.1 update, however that image has yet to appear on dockerhub as standalone or as a link to latest. i'll keep an eye on dockerhub and update to that version when it syncs there.

additionally, as 2 of the 3 jobs i have running post webhooks depending on query results (new items on check) i've been aware that the jobs have been working despite seemingly running endlessly due to the webhooks continuing to fire as usual.

thanks for the quick fix, please let me know if i can provide any further information.

krynble commented 1 year ago

Yes you are correct @craigz the workflows do run as normal and they're not running forever. It's just the database status that was inconsistent, but the runtime was fine.

So although annoying there was no impact to productivity or the execution itself.

The images are being built and should be available any time soon, CI is running :)

I'll be closing this issue but if you find any remaining problems please do reopen.

n8n-io / n8n

executions continually running #5561