n8n-io / n8n

Free and source-available fair-code licensed workflow automation tool. Easily automate tasks across different services.
https://n8n.io
Other
45.49k stars 6.25k forks source link

n8n is taking a lot of RAM memory #7939

Open pablorq opened 9 months ago

pablorq commented 9 months ago

Describe the bug The n8n process is taking a lot of RAM memory, in this case 1.1 GB, and 41.7 GB of virtual memory.

To Reproduce Steps to reproduce the behavior:

  1. Install n8n using Docker/Portainer
  2. Use it with some workflows managing some data (5000 rows in a Google Sheets, for instance)
  3. Check the System Monitor.
  4. See RAM memory consumption

Expected behavior A little memory consumption when idle. If taking a lot of memory during a workflow run, release it when it ends.

Environment (please complete the following information):

Additional context Here are some screenshots. All process list sort by memory: Screenshot from 2023-12-05 20-44-08

n8n process details: Screenshot from 2023-12-05 20-46-59

I did a new fresh install of the same Docker stack, without any workflow, and the memory consumption seems to be low. This makes me thing about this could be a (or some) workflow that is taking a lot memory, or that successive running keep increasing the memory without release it.

In any case the memory should be freed when no workflow is running.

Joffcom commented 9 months ago

Hey @pablorq,

n8n loads things into memory while working so I would expect it to grow and to also be higher the more items you are using but the memory itself will drop down.

Can you check the Portainer memory stats while your workflow is running to check if that is showing the memory being released? I would expect that you would see that grow then decrease a few seconds after the workflow has finished.

Or can you check the container stats using the docker command to see what that shows, it could be that the 1GB is reserved but only 300mb is being used.

pablorq commented 9 months ago

Hi @Joffcom !

Maybe I didn't express myself correctly: this data is taken when no workflow was running.

I just was speculating about what could be happen when I've talked about the workflow using Google Sheet to move a lot of data, because is the only one that I can relate with a high memory use. But maybe it's another thing.

To give you more detail, I have 7 workflows that activate by time (RSS trigger or schedule trigger), most of them every 3 hours and 2 of them every hour. Most of the time is a sequence like "run, check everything is ok, end". That is something that takes a minute or so.

When no workflow is running is when I have this memory consumption. In fact, now is 1.5GB.

This is the container stats: Screenshot 2023-12-07 at 10-37-40 Portainer local

This is more info from inside de container:

/home/node # top
Mem: 21443940K used, 2516304K free, 1060796K shrd, 307824K buff, 5260444K cached
CPU:   7% usr  13% sys   0% nic  79% idle   0% io   0% irq   0% sirq
Load average: 2.14 2.43 2.29 2/2239 65
  PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
   23    21 node     S    42.1g 169%   5   0% node /usr/local/bin/n8n
   52     0 root     S     1672   0%   0   0% sh
    1     0 root     S     1604   0%   2   0% sh -c whoami pwd npm install -g @mozilla/readability jsdom echo "done install" su node -c "tini -s -- /docker-entrypoint.sh &" sl
   65    52 root     R     1600   0%   0   0% top
   22     1 root     S     1588   0%   2   0% sleep infinity
   21     1 node     S      816   0%   6   0% tini -s -- /docker-entrypoint.sh
/home/node # free -h
              total        used        free      shared  buff/cache   available
Mem:          22.8G       13.5G        2.4G        1.0G        7.0G        7.7G
Swap:         32.0G        7.5G       24.5G

Important info

Now I've noticed that I have 3 "forever running" executions (issue #7754).

I don't know if that is related, I didn't check it when first I've opened this issue.

I'm still in v1.16.0. I'm going to upgrade and see how everything works.

Joffcom commented 9 months ago

Hey @pablorq,

I was just looking at my own instance that has about 40 active workflows on it and I have a very different graph, This is after running 2948 workflows since the last restart which I think was yesterday.

image

This is n8n running in Portainer using a Postgres database as well, Out of interest if you start up a new instance that doesn't add the extra node package and just runs like normal do you also see the same usage?

Can you also share the environment variables and portainer settings you have configured, As an example in portainer have you set any of the Runtime & Resources options or Capabilities?

pablorq commented 9 months ago

Hi @Joffcom !

I've upgraded today to v1.18.2 and by now (a couple of hours) it seems pretty good.

Look at this: Screenshot 2023-12-07 at 14-52-59 Portainer local

This is my portainer stack editor (docker-compose style). This is also n8n+postgres (I would prefer mariadb, by the way):

# n8n with persistent volume Postgres
# https://raw.githubusercontent.com/n8n-io/n8n/master/docker/compose/withPostgres/docker-compose.yml

version: '3.8'

volumes:
  db_data:
  main_data:

services:
  postgres:
    image: postgres:11
    restart: always
    environment:
      - POSTGRES_USER=${POSTGRES_USER}
      - POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
      - POSTGRES_DB=${POSTGRES_DB}
      - POSTGRES_NON_ROOT_USER=${POSTGRES_NON_ROOT_USER}
      - POSTGRES_NON_ROOT_PASSWORD=${POSTGRES_NON_ROOT_PASSWORD}
    volumes:
      - db_data:/var/lib/postgresql/data
      - ${N8N_LOCAL_FILES}/n8n-config/init-data.sh:/docker-entrypoint-initdb.d/init-data.sh
    healthcheck:
      test: ['CMD-SHELL', 'pg_isready -h localhost -U ${POSTGRES_USER} -d ${POSTGRES_DB}']
      interval: 5s
      timeout: 5s
      retries: 10

  main:
    image: docker.n8n.io/n8nio/n8n:${N8N_VERSION}
#    image: docker.n8n.io/n8nio/n8n
    restart: always
    environment:
      - DB_TYPE=postgresdb
      - DB_POSTGRESDB_HOST=postgres
      - DB_POSTGRESDB_PORT=5432
      - DB_POSTGRESDB_DATABASE=${POSTGRES_DB}
      - DB_POSTGRESDB_USER=${POSTGRES_NON_ROOT_USER}
      - DB_POSTGRESDB_PASSWORD=${POSTGRES_NON_ROOT_PASSWORD}
      - GENERIC_TIMEZONE=Europe/Madrid
      - TZ=Europe/Madrid
      - NODE_FUNCTION_ALLOW_EXTERNAL=*
    ports:
      - 5678:5678
    links:
      - postgres
    volumes:
      - main_data:/home/node/.n8n
      - ${N8N_LOCAL_FILES}:/files
    depends_on:
      postgres:
        condition: service_healthy
    user: root
    entrypoint: sh -c
    command:
      - |
        whoami
        pwd
        npm install -g @mozilla/readability jsdom
        echo "done install"
        su node -c "tini -s -- /docker-entrypoint.sh &"
        sleep infinity

    extra_hosts:
    # Access host localhost
      - "local.host:host-gateway"

One question: what do you mean about Runtime & Resources options or Capabilities? I think the only thing configured is the timezone.

Joffcom commented 9 months ago

Hey @pablorq,

Runtime & Resources and Capabilities are options you would set in the Portainer UI. We do support MariaDB at the moment as well but we will be removing support for it in the future.

pablorq commented 9 months ago

I see those options in the Portainer documentation, but I don't have them in the container configuration. (Using latest available Portainer v2.19).

In the host setup, I have the following options disabled and greyed out:

It seems that this option are only for agent based hosts, but I'm using Portainer without an agent, connecting directly to the Docker socket (Linux).

In summary, we can think about those options are by default.

pablorq commented 9 months ago

Now I have 2 forever running executions and this is the container status:

# uptime
 09:19:43 up 7 days, 22:06,  0 users,  load average: 3.99, 3.27, 3.31

Screenshot 2023-12-10 at 09-18-21 Portainer local

After 7 days now it has more than 600 MB, and is increasing.

I don't really know if this has anything to do with the forever running executions. Maybe is another thing.

Any ideas or test to do?

Joffcom commented 9 months ago

Hey @pablorq,

I am still not seeing this so maybe it is linked to forever running executions 🤔 Did you try using a docker image without the extra run commands you are using so it is exactly the same as what we ship?

One of the things I noticed this morning when taking a quick look was the memory leak reports on the jsdom package going back to 2013, If you are using jsdom a lot there is a change that some of your issues could be coming from that.

pablorq commented 9 months ago

Hi @Joffcom!

I'm doing more tests.

As you point out in your last message, maybe the jsdom library and the extra commands to install the jsdom library affects in some way.

So I've duplicated the stack (copying the DB):

The first picture is good: n8n gets 200MB vs n8n-jsdom getting 340MB. No FRE (Forever Running Execution) in n8n-jsdom.

View the screenshots n8n: ![Screenshot 2023-12-12 at 22-34-10 Portainer local](https://github.com/n8n-io/n8n/assets/774839/8338b4d2-7d87-4fd5-aa25-eb891ac5cb66) n8n-jsdom: ![Screenshot 2023-12-12 at 22-34-20 Portainer local](https://github.com/n8n-io/n8n/assets/774839/68da3166-58d8-4fb3-b5c6-bc3c66484620)

After 24 hours (more or less) the picture is different: n8n with 380MB and n8n-jsdom with 240MB. No FRE in n8n-jsdom.

View the screenshots n8n: ![Screenshot 2023-12-13 at 15-20-50 Portainer local](https://github.com/n8n-io/n8n/assets/774839/dc39b1be-5b36-4c75-8f21-e49349673445) n8n-jsdom: ![Screenshot 2023-12-13 at 15-20-58 Portainer local](https://github.com/n8n-io/n8n/assets/774839/c5ec0d02-5210-4fcb-b518-57a37ade84d0)

After 24 more hours, the image is complete different: n8n with 580MB and n8n-jsdom with 240MB. 2 FRE workflows in n8n-jsdom.

View the screenshots n8n: ![Screenshot 2023-12-14 at 08-34-24 Portainer local](https://github.com/n8n-io/n8n/assets/774839/c588e752-ef0d-4e0d-a0f8-3d0a12600f2e) n8n-jsdom: ![Screenshot 2023-12-14 at 08-34-11 Portainer local](https://github.com/n8n-io/n8n/assets/774839/1d4c0347-68be-4f11-bb80-58e68138545c)

More info

The n8n stack has only the n8n standard nodes, and the n8n-nodes-rss-feed-trigger community node (I've to update this, I know), and I'm using code nodes in javascript and Python.

In the n8n-jsdom stack there are only 3 workflows activated (active), and all of them calls the workflow that uses a code node with the jsdom library.

This is the log file from the n8n stack: n8n-mem-231214-1429.log

In the log file I can see some errors and repeated queries for waiting executions, but I don't know if this is the cause of the memory increasing.

Any ideas?

Joffcom commented 9 months ago

That is interesting as I am still not seeing the same issue and there are not a lot of other reports of the same issue I was sure the issue was likely to be down to the issues in those other packages.

I guess to dig into this further we will need to start looking at your workflows in more detail to see what is going on although what is the frequency of your workflows running and how many are there?

pablorq commented 9 months ago

Well, in fact the n8n-jsdom stack works better with the memory (keeps a low memory usage) than the n8n stack with the "unmodified" n8n docker image.

Some info about the workflows in n8n stack:

The active workflows run with schedule trigger:

As you can see, this is a very simple scheme with 4 really used workflows, and 2 "sub-worflows" called by another workflow, that runs on a long time activation. I think that there's plenty of time to clean and reduce the memory usage.

Today I've added another sub-workflow that is called at the end of all the 6 used workflows to log the memory usage, to try to find if this is related to a specific workflow.

I'll let you know when I have more info.

pablorq commented 9 months ago

Memory log check

Well, this was very fast!

I've restarted the stack at 11:21 (2023-12-16T10:21:53.728Z) and the some of the workflows have ran at 12:00.

Memory log data:

2023-12-16T12:00:00.681+01:00,"Cloud Wake Up",284176384
2023-12-16T12:00:03.021+01:00,"RecaLinux - Wednesday Activation",285360128
2023-12-16T12:00:05.056+01:00,"RecaLinux - Episode - Changer",323584000
2023-12-16T12:00:17.863+01:00,"Intelief - Scrape Post",415526912
2023-12-16T12:00:18.432+01:00,"Intelief - RSS Google Alert - IA",468660224

Between n1 and n2 there's no a lot of difference. I'm using "n" because I can't use the hashtag symbol to refer the order in the log file.

Between n2 and n3 there is 50MB more. But between n3 and n4 there is 90MB more! And it keeps increasing between n4 and n5 with 50MB more.

Seems like all these memory is not released after the workflows are finished:

Screenshot 2023-12-16 at 12-37-26 Portainer local

This is the n8n log: n8n-mem-231216-1229.log

And this is an image of the execution that increased 90MB:

Screenshot 2023-12-16 at 12-45-40 n8n - Workflow Automation

Al node codes run Python code. There is a Google Sheet node that gets 7000+ rows, but then only are selected 6 items to work with the rest of the workflow.

More info

The n1 and n2 workflows from the memory log had no code nodes, and seems to no increase the memory.

The n3, n4 and n5 had code nodes with Python code.

Cold it be related to the Python implementation?

Comparing with n8n-jsdom

The duplicated stack called n8n-jsdom has the same workflows but only 3 of them are activated. Those activated in n8n-jsdom are disabled in the n8n stack.

In the n8n-jsdom stack the workflows that are running have code nodes, but only 2 of them use Python (only 2 code nodes with Python). Moreover, the activation of the 3 workflows is done by a RSS trigger and not with a schedule trigger, which in the n8n stack has a regular hourly (or 3h in 1 workflow) basis.

If each Python code node leaves some memory used, the more used in the n8n stack and the more frequent run basis could explain the difference between memory usage between the 2 twin stacks. Don't you think?

Joffcom commented 9 months ago

Hey @pablorq,

It could be related to the Python implementation, We know the code node can cause a memory increase but typically it is freed up as seen in my own system. I don't however use the Python option so that is a good find that we maybe could have discovered sooner if we had the workflows.

When I am back in January I will do some testing with the code node using Python to see if this can be reproduced.

pablorq commented 8 months ago

Hi @Joffcom,

I was doing some more tests and I think that this is the main issue, it is the code node using Python.

To test it, I've created this workflow to run 10 code node with the default Python code, the one that add a column with a number: Memory_eater.json

This is the result: Screenshot 2023-12-20 at 08-38-39 Portainer local

Memory log:

``` 2023-12-20T08:08:29.341+01:00,351313920,Memory eater 2023-12-20T08:08:59.379+01:00,375570432,Memory eater 2023-12-20T08:09:29.266+01:00,395194368,Memory eater 2023-12-20T08:09:59.006+01:00,446697472,Memory eater 2023-12-20T08:10:29.533+01:00,443576320,Memory eater 2023-12-20T08:10:59.198+01:00,471187456,Memory eater 2023-12-20T08:11:29.437+01:00,493342720,Memory eater 2023-12-20T08:11:59.001+01:00,530960384,Memory eater 2023-12-20T08:12:29.136+01:00,575094784,Memory eater 2023-12-20T08:12:59.012+01:00,575557632,Memory eater 2023-12-20T08:13:30.061+01:00,613093376,Memory eater 2023-12-20T08:13:59.031+01:00,605822976,Memory eater 2023-12-20T08:14:29.993+01:00,644788224,Memory eater 2023-12-20T08:14:59.558+01:00,665673728,Memory eater 2023-12-20T08:15:29.402+01:00,698855424,Memory eater 2023-12-20T08:15:59.521+01:00,705019904,Memory eater 2023-12-20T08:16:29.132+01:00,740290560,Memory eater 2023-12-20T08:16:59.398+01:00,760074240,Memory eater 2023-12-20T08:17:29.442+01:00,775995392,Memory eater 2023-12-20T08:17:59.742+01:00,795758592,Memory eater 2023-12-20T08:18:29.197+01:00,812089344,Memory eater 2023-12-20T08:18:59.656+01:00,831111168,Memory eater 2023-12-20T08:19:29.447+01:00,847880192,Memory eater 2023-12-20T08:19:59.933+01:00,887824384,Memory eater 2023-12-20T08:20:30.253+01:00,902078464,Memory eater 2023-12-20T08:20:59.161+01:00,922808320,Memory eater 2023-12-20T08:21:29.258+01:00,953184256,Memory eater 2023-12-20T08:22:00.793+01:00,979132416,Memory eater 2023-12-20T08:22:29.412+01:00,990216192,Memory eater 2023-12-20T08:22:59.608+01:00,1005944832,Memory eater ```

Can you duplicate this behavior?

(By the way, it should be nice to have a real Python node, instead one running over javascript.)

Joffcom commented 8 months ago

Hey @pablorq,

I will give it a test, We know that using the code node increases memory already because it creates a sandbox each time but runnning your worklfow I can see the increase I would expect then the decrease of memory usage.

image

It looks like some improvements can still be made there as it doesn't clean up as nicely as I would have expected but it does appear to be doing it. I will get a dev ticket created to see if we can change this.

It did drop down again during the standard garbage collection process as well so generally it is working as it should but it just isn't ideal

image

Joffcom commented 8 months ago

Our internal ticket for this is ADO-1629

pablorq commented 8 months ago

Yes, I can see the clean up but it seems that something is remaining and the cumulative remaining keep the memory increasing and increasing.

This is a report of the mem usage from the last 4 days, where the clean ups can be seen, but also how, little by little, the memory is increasing:

n8n-cumulative-mem-increase

At first, the memory was about 100MB, which is ideal. Then starts to grow and clean, but not in the same amount, leaving some memory as "used". Then, this morning, something strange happened: at 07:00 the memory was at 770MB, at 08:00 at 550MB, and at 09:00 at 280MB, all running workflows without Python code node. And then, a workflow with some Python code nodes was ran and the memory went from that 280MB to 730MB, following the previous memory "growing" line".

I have no idea was what that. Maybe the clean up has some kind of buffer and a new Python run retrieves that buffer? I don't know.

This is the memory usage log:

n8n-memory-usage.log

Joffcom commented 8 months ago

Hey @pablorq,

That matches with what I said in my reply earlier today, Hopefully this will be picked up soon and worked on.

stevenjunio commented 6 months ago

Seeing this same issue here. Python code node related