matrix-org / dendrite

Dendrite is a second-generation Matrix homeserver written in Go!
https://matrix-org.github.io/dendrite/
Apache License 2.0
5.67k stars 664 forks source link

Massive memory usage with 0.6 until RAM maxes #2135

Closed ElDifinitivo closed 2 years ago

ElDifinitivo commented 2 years ago

Background information

Description

Recently updated to 0.6 from 0.5.1 and noticed slow increases of memory usage until it overloads the server dendrite is running on, causing kswapd0 to spike CPU usage rendering the VPS entirely unusable.

There are only 2 users on this homeserver, and I never saw memory usage on 0.5.1 going past 400MB. The VPS running dendrite has 2GB, and dendrite will slowly rise from 200MB to over 1GB (without any activity on the homeserver, both users using Element).

All users on the homeserver. The VPS will overload, causing constant connection issues for any client.

Immediately upon starting the homeserver (docker-compose -f docker.compose.monolith.yml up). It usually takes just around 10 minutes before memory usage is maxed, swap attempts to mitigate to no avail, and CPU usage spikes to 100%.

Upon updating to 0.6, everything ran smoothly, nothing manifests on the client end of potential issues. After a short time of ever-increasing memory usage, dendrite maxes out the system resources.

Steps to reproduce

The output of starting the homeserver. It took just under 11 minutes to max the system resources before Ctrl+C. No clients were being actively used at the time.

> docker-compose -f docker-compose.monolith.yml up

WARNING: Found orphan containers (dendrite_postgres_1) for this project. If you removed or renamed this service in your compose file, you can run this command with the --remove-orphans flag to clean it up.
Starting dendrite_monolith_1 ... done
Attaching to dendrite_monolith_1
monolith_1  | time="2022-01-31T17:39:31.854968327Z" level=info msg="Dendrite version 0.6.0"
monolith_1  | [1] [INF] Starting nats-server
monolith_1  | [1] [INF]   Version:  2.6.6
monolith_1  | [1] [INF]   Git:      [not set]
monolith_1  | [1] [INF]   Name:     monolith
monolith_1  | [1] [INF]   Node:     2VDOOQDB
monolith_1  | [1] [INF]   ID:       NC4YKGCUVBJBKLPHF7ZKGEI24N4P3YGJB2MTMTQBGAPLIQ4JWXNVUKRA
monolith_1  | [1] [WRN] Maximum payloads over 8.00 MB are generally discouraged and could lead to poor performance
monolith_1  | [1] [INF] Starting JetStream
monolith_1  | [1] [INF]     _ ___ _____ ___ _____ ___ ___   _   __  __
monolith_1  | [1] [INF]  _ | | __|_   _/ __|_   _| _ \ __| /_\ |  \/  |
monolith_1  | [1] [INF] | || | _|  | | \__ \ | | |   / _| / _ \| |\/| |
monolith_1  | [1] [INF]  \__/|___| |_| |___/ |_| |_|_\___/_/ \_\_|  |_|
monolith_1  | [1] [INF]
monolith_1  | [1] [INF]          https://docs.nats.io/jetstream
monolith_1  | [1] [INF]
monolith_1  | [1] [INF] ---------------- JETSTREAM ----------------
monolith_1  | [1] [INF]   Max Memory:      1.45 GB
monolith_1  | [1] [INF]   Max Storage:     26.41 GB
monolith_1  | [1] [INF]   Store Directory: "jetstream"
monolith_1  | [1] [INF] -------------------------------------------
monolith_1  | [1] [INF]   Restored 0 messages for stream "DendriteInputRoomEvent"
monolith_1  | [1] [INF]   Restored 0 messages for stream "DendriteOutputClientData"
monolith_1  | [1] [INF]   Restored 0 messages for stream "DendriteOutputKeyChangeEvent"
monolith_1  | [1] [INF]   Restored 0 messages for stream "DendriteOutputReceiptEvent"
monolith_1  | [1] [INF]   Restored 0 messages for stream "DendriteOutputRoomEvent"
monolith_1  | [1] [INF]   Restored 0 messages for stream "DendriteOutputSendToDeviceEvent"
monolith_1  | [1] [INF]   Recovering 1 consumers for stream - "DendriteInputRoomEvent"
monolith_1  | [1] [INF]   Recovering 1 consumers for stream - "DendriteOutputClientData"
monolith_1  | [1] [INF]   Recovering 3 consumers for stream - "DendriteOutputKeyChangeEvent"
monolith_1  | [1] [INF]   Recovering 2 consumers for stream - "DendriteOutputReceiptEvent"
monolith_1  | [1] [INF]   Recovering 2 consumers for stream - "DendriteOutputRoomEvent"
monolith_1  | [1] [INF]   Recovering 2 consumers for stream - "DendriteOutputSendToDeviceEvent"
monolith_1  | [1] [INF] Server is ready
monolith_1  | time="2022-01-31T17:39:31.987071486Z" level=info msg="Enabled perspective key fetcher" num_public_keys=2 server_name=matrix.org
monolith_1  | time="2022-01-31T17:39:32.010274480Z" level=info msg="Enabling shared secret registration at /_synapse/admin/v1/register"
monolith_1  | time="2022-01-31T17:39:32.016351868Z" level=info msg="Setting m.server as <DOMAIN>:443 at /.well-known/matrix/server"
monolith_1  | time="2022-01-31T17:39:32.061768303Z" level=info msg="Enabling MSC" context=missing msc=msc2836
monolith_1  | time="2022-01-31T17:39:32.064742265Z" level=info msg="Enabling MSC" context=missing msc=msc2946
monolith_1  | time="2022-01-31T17:39:32.066790308Z" level=info msg="Starting external Monolith listener on :8448"
monolith_1  | time="2022-01-31T17:39:32.068752346Z" level=info msg="Starting external Monolith listener on :8008"
bones-was-here commented 2 years ago

Old versions of golang have a "feature" of not freeing memory until there's memory pressure (bad perf). You could set this env var as a temporary workaround (as per docs/systemd/monolith-example.service (comment add by me) )

# Use less memory; this became redundant in golang 1.16
Environment=GODEBUG=madvdontneed=1

But I would recommend upgrading to a newer golang, because 1.16 will be the minimum for future versions of dendrite.

ElDifinitivo commented 2 years ago

So with go version go1.17.6 linux/amd64 from bullseye-backports and was still getting the issues but after 15+ minutes. Messages like monolith_1 | [1] [WRN] pipe - cid:49 - "v1.13.0:go" - Readloop processing time: 5.674480815s began to pop up when left running and I wondered why v1.13.0 was being used in some way.

I restarted the VPS and it seemed to have cleared any issues (potentially cache? idk). Mem usage for dendrite rises and peaks around 300/400MB again.

bones-was-here commented 2 years ago

I ensure everything is built fresh (with debian's 1.17.6) by git clean -fdx and deleting ~/go and ~/.cache in dendrite's system account. I don't use any container stuff, just a system account with all dendrite-related files in its $HOME and my /etc/systemd/system/dendrite.service