n8n-io / n8n

Free and source-available fair-code licensed workflow automation tool. Easily automate tasks across different services.
https://n8n.io
Other
49.29k stars 7.89k forks source link

N8N Workers die every few hours #11802

Open Foreignpay opened 2 days ago

Foreignpay commented 2 days ago

Bug Description

I have 3 webhooks running, 1 master and 10 vorkers. The 10 workers are on a separate server and connected to the master using Docker Swarm. The server where the Workers are located has 16 cores and 16GB of RAM allocated. Containers have a limit of 1.5 GB of RAM, the parallelism value is standard, equal to 10. 1-2 times a day (or at night) there is an abnormal spike in resource consumption, a queue starts to appear, which can't go through (we have to delete it through the database). A screenshot from Grafana is attached. After reloading all Workers - everything comes back to normal. What can be done about it? This is a critical bug because of which the team is thinking of moving to Python.

To Reproduce

We don't know.

We don't notice overload on the server or on the database. We don't see any regularity, but it most often happens in the evening and at night.

The load from the outside does not increase in this case.in the database. It doesn't happen regularly

Expected behavior

I really want stability and no glitches like this.

Operating System

Ubuntu Linux 22.04

n8n Version

1.67.1 (we tried a lot of different versions since 1.60)

Node.js Version

Docker

Database

PostgreSQL

Execution mode

queue

Joffcom commented 2 days ago

Hey @Foreignpay,

We have created an internal ticket to look into this which we will be tracking as "GHC-463"

Joffcom commented 2 days ago

Hey @Foreignpay

Are you reporting a bug or trying to raise a support issue?

Foreignpay commented 2 days ago
Снимок экрана 2024-11-19 в 23 51 10

A screenshot from Grafana is attached.

Foreignpay commented 2 days ago

Hey @Foreignpay

Are you reporting a bug or trying to raise a support issue?

It seems like a bug. We have not found any explanation for this, and only a reboot helps.

netroy commented 2 days ago

This sounds like you have an active workflow that's causing it. Would it be possible for you to try disabling workflows in batches to narrow down the workflow causing this?

Foreignpay commented 2 days ago

Idk, because we provide acquiring services. It's too hard to disable it.