spantaleev / matrix-docker-ansible-deploy

🐳 Matrix (An open network for secure, decentralized communication) server setup using Ansible and Docker
GNU Affero General Public License v3.0
4.74k stars 1.02k forks source link

VM running playbook high CPU usage #2815

Closed gouthamravee closed 1 year ago

gouthamravee commented 1 year ago

Playbook Configuration:

My vars.yml file looks like this:

---
# The bare domain name which represents your Matrix identity.
# Matrix user ids for your server will be of the form (`@user:<matrix-domain>`).
#
# Note: this playbook does not touch the server referenced here.
# Installation happens on another server ("matrix.<matrix-domain>").
#
# If you've deployed using the wrong domain, you'll have to run the Uninstalling step,
# because you can't change the Domain after deployment.
#
# Example value: example.com
matrix_domain: my.domain.com

# The Matrix homeserver software to install.
# See:
#  - `roles/custom/matrix-base/defaults/main.yml` for valid options
# - the `docs/configuring-playbook-IMPLEMENTATION_NAME.md` documentation page, if one is available for your implementation choice
matrix_homeserver_implementation: synapse

# A secret used as a base, for generating various other secrets.
# You can put any string here, but generating a strong one is preferred (e.g. `pwgen -s 64 1`).
matrix_homeserver_generic_secret_key: SECRET

# This is something which is provided to Let's Encrypt when retrieving SSL certificates for domains.
#
# In case SSL renewal fails at some point, you'll also get an email notification there.
#
# If you decide to use another method for managing SSL certificates (different than the default Let's Encrypt),
# you won't be required to define this variable (see `docs/configuring-playbook-ssl-certificates.md`).
#
# Example value: someone@example.com
matrix_ssl_lets_encrypt_support_email: email@mail.com

# A Postgres password to use for the superuser Postgres user (called `matrix` by default).
#
# The playbook creates additional Postgres users and databases (one for each enabled service)
# using this superuser account.
devture_postgres_connection_password: SECRET

#matrix_docker_installation_enabled: false
matrix_playbook_docker_installation_enabled: false
# Do not retrieve SSL certificates. This shall be managed by another webserver or other means.
#matrix_ssl_retrieval_method: none

# Do not try to serve HTTPS, since we have no SSL certificates.
# Disabling this also means services will be served on the HTTP port
# (`matrix_nginx_proxy_container_http_host_bind_port`).
#matrix_nginx_proxy_https_enabled: false

#matrix_nginx_proxy_enabled: false

# Do not listen for HTTP on port 80 globally (default), listen on the loopback interface.
# If you'd like, you can make it use the local network as well and reverse-proxy from another local machine.
#matrix_nginx_proxy_container_http_host_bind_port: '80'

# Likewise, expose the Matrix Federation port on the loopback interface.
# Since `matrix_nginx_proxy_https_enabled` is set to `false`, this federation port will serve HTTP traffic.
# If you'd like, you can make it use the local network as well and reverse-proxy from another local machine.
#
# You'd most likely need to expose it publicly on port 8448 (8449 was chosen for the local port to prevent overlap).
#matrix_nginx_proxy_container_federation_host_bind_port: '8448'

# Coturn relies on SSL certificates that have already been obtained.
# Since we don't obtain any certificates (`matrix_ssl_retrieval_method: none` above), it won't work by default.
# An alternative is to tweak some of: `matrix_coturn_tls_enabled`, `matrix_coturn_tls_cert_path` and `matrix_coturn_tls_key_path`.
matrix_coturn_enabled: false

# Trust the reverse proxy to send the correct `X-Forwarded-Proto` header as it is handling the SSL connection.
#matrix_nginx_proxy_trust_forwarded_proto: true

# Trust and use the other reverse proxy's `X-Forwarded-For` header.
#matrix_nginx_proxy_x_forwarded_for: '$proxy_add_x_forwarded_for'

matrix_synapse_admin_enabled: true
matrix_synapse_ext_password_provider_shared_secret_auth_enabled: true
matrix_synapse_ext_password_provider_shared_secret_auth_shared_secret: 
#SECRET

# The easy way. The specified Matrix user ID will be made an admin of all bridges
matrix_admin: "@admin:{{ matrix_domain }}"

matrix_mautrix_telegram_enabled: true
matrix_mautrix_telegram_api_id: SECRET
matrix_mautrix_telegram_api_hash: SECRET
matrix_mautrix_telegram_bridge_encryption_allow: true

matrix_bot_maubot_enabled: true
matrix_bot_maubot_admins:
  - admin: SECRET

matrix_registration_enabled: true
matrix_registration_admin_secret: SECRET

matrix_mautrix_signal_enabled: true
matrix_mautrix_signal_bridge_encryption_allow: true

matrix_mautrix_whatsapp_enabled: true
matrix_mautrix_whatsapp_bridge_encryption_allow: true

matrix_mautrix_discord_enabled: true
matrix_mautrix_discord_bridge_encryption_allow: true

matrix_mx_puppet_steam_enabled: true

matrix_synapse_configuration_extension_yaml: |
  limit_remote_rooms:
    enabled: true
    complexity: 1.0

matrix_synapse_media_retention_local_media_lifetime: 7d
matrix_synapse_media_retention_remote_media_lifetime: 3d

matrix_synapse_max_upload_size_mb: 500

matrix_synapse_workers_enabled: true
matrix_synapse_workers_preset: one-of-each

matrix_dimension_enabled: false

matrix_dimension_access_token: SECRET

matrix_mailer_sender_address: "synapse@domain"
matrix_mailer_relay_use: true
matrix_mailer_relay_host_name: "smtp.mailgun.org"
matrix_mailer_relay_host_port: 587
matrix_mailer_relay_auth: true
matrix_mailer_relay_auth_username: "synapse@domain"
matrix_mailer_relay_auth_password: "SECRET"

matrix_mautrix_instagram_enabled: true
matrix_mautrix_instagram_configuration_extension_yaml: |
  bridge:
    encryption:
      allow: true
      default: true

matrix_client_element_themes_enabled: true

#https://github.com/spantaleev/matrix-docker-ansible-deploy/blob/master/docs/configuring-playbook-synapse-auto-compressor.md
matrix_synapse_auto_compressor_enabled: true
#https://github.com/spantaleev/matrix-docker-ansible-deploy/blob/master/docs/configuring-playbook-sliding-sync-proxy.md
matrix_sliding_sync_enabled: true

matrix_playbook_reverse_proxy_type: playbook-managed-traefik

# Ensure that public urls use https
matrix_playbook_ssl_enabled: true

# Disable the web-secure (port 443) endpoint, which also disables SSL certificate retrieval
devture_traefik_config_entrypoint_web_secure_enabled: false

# If your reverse-proxy runs on another machine, consider using `0.0.0.0:81`, just `81` or `SOME_IP_ADDRESS_OF_THIS_MACHINE:81`
devture_traefik_container_web_host_bind_port: '0.0.0.0:81'

# We bind to `127.0.0.1` by default (see above), so trusting `X-Forwarded-*` headers from
# a reverse-proxy running on the local machine is safe enough.
devture_traefik_config_entrypoint_web_forwardedHeaders_insecure: true

# Or, if you're publishing the port (`devture_traefik_container_web_host_bind_port` above) to a public network interfaces:
# - remove the `devture_traefik_config_entrypoint_web_forwardedHeaders_insecure` variable definition above
# - uncomment and adjust the line below
# devture_traefik_config_entrypoint_web_forwardedHeaders_trustedIPs: ['IP-ADDRESS-OF-YOUR-REVERSE-PROXY']

# Likewise (to `devture_traefik_container_web_host_bind_port` above),
# if your reverse-proxy runs on another machine, consider changing the `host_bind_port` setting below.
#devture_traefik_additional_entrypoints_auto:
#  - name: matrix-federation
#    port: 8449
#    host_bind_port: '127.0.0.1:8449'
#    config: {}
# If your reverse-proxy runs on another machine, remove the config above and use this config instead:
config:
  forwardedHeaders:
    insecure: true
    trustedIPs: ['x.x.x.x.']

Matrix Server:

Problem description:

The VM for the playbook is running in proxmox, has 8 cores and 6GB of ram For the past week the VM has been locking up with 100% cpu usage for many minutes, sometimes it won't come back online until I restart the VM.

I'm not seeing what is causing the high CPU usage, it seems like docker but I can't pinpoint to what specific container from the playbook.

Other VMs on the same proxmox host are working fine.

I'm seeing the message below in the systemd journal

Aug 04 11:39:50 synapse matrix-synapse-reverse-proxy-companion[40066]: 172.18.0.20 - - [04/Aug/2023:15:39:50 +0000] "GET /_matrix/client/v3/sync?timeout=30000&since=s122978_2361400_216_67248_81395_51_20076_50_0_1&filter=0&set_presence=online HTTP/1.0" 200 213 "-" "mautrix-telegram/0.14.1 mautrix-python/0.20.0 aiohttp/3.8.4 Python/3.11.4" "-"
Aug 04 11:39:50 synapse matrix-nginx-proxy[41140]: 172.18.0.24 - - [04/Aug/2023:15:39:50 +0000] "GET /_matrix/client/v3/sync?timeout=30000&since=s122978_2361400_216_67248_81395_51_20076_50_0_1&filter=0&set_presence=online HTTP/1.1" 200 178 "-" "mautrix-telegram/0.14.1 mautrix-python/0.20.0 aiohttp/3.8.4 Python/3.11.4" "-"
Aug 04 11:39:50 synapse matrix-postgres[36184]: 2023-08-04 15:39:50.830 UTC [11033] FATAL:  remaining connection slots are reserved for non-replication superuser connections
Aug 04 11:39:50 synapse matrix-mautrix-telegram[42282]: [2023-08-04 15:39:50,831] [WARNING@mau.bridge.e2ee.client] Failed to store next batch
Aug 04 11:39:50 synapse matrix-mautrix-telegram[42282]: Traceback (most recent call last):
Aug 04 11:39:50 synapse matrix-mautrix-telegram[42282]:   File "/usr/lib/python3.11/site-packages/mautrix/client/syncer.py", line 460, in _start
Aug 04 11:39:50 synapse matrix-mautrix-telegram[42282]:     await self.sync_store.put_next_batch(next_batch)
Aug 04 11:39:50 synapse matrix-mautrix-telegram[42282]:   File "/usr/lib/python3.11/site-packages/mautrix/crypto/store/asyncpg/store.py", line 102, in put_next_batch
Aug 04 11:39:50 synapse matrix-mautrix-telegram[42282]:     await self.db.execute(q, self._sync_token, self.account_id)
Aug 04 11:39:50 synapse matrix-mautrix-telegram[42282]:   File "/usr/lib/python3.11/site-packages/mautrix/util/async_db/database.py", line 135, in execute
Aug 04 11:39:50 synapse matrix-mautrix-telegram[42282]:     async with self.acquire() as conn:
Aug 04 11:39:50 synapse matrix-mautrix-telegram[42282]:   File "/usr/lib/python3.11/contextlib.py", line 204, in __aenter__
Aug 04 11:39:50 synapse matrix-mautrix-telegram[42282]:     return await anext(self.gen)
Aug 04 11:39:50 synapse matrix-mautrix-telegram[42282]:            ^^^^^^^^^^^^^^^^^^^^^
Aug 04 11:39:50 synapse matrix-mautrix-telegram[42282]:   File "/usr/lib/python3.11/site-packages/mautrix/util/async_db/asyncpg.py", line 100, in acquire
Aug 04 11:39:50 synapse matrix-mautrix-telegram[42282]:     async with self.pool.acquire() as conn:
Aug 04 11:39:50 synapse matrix-mautrix-telegram[42282]:   File "/usr/lib/python3.11/site-packages/asyncpg/pool.py", line 998, in __aenter__
Aug 04 11:39:50 synapse matrix-mautrix-telegram[42282]:     self.connection = await self.pool._acquire(self.timeout)
Aug 04 11:39:50 synapse matrix-mautrix-telegram[42282]:                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Aug 04 11:39:50 synapse matrix-mautrix-telegram[42282]:   File "/usr/lib/python3.11/site-packages/asyncpg/pool.py", line 838, in _acquire
Aug 04 11:39:50 synapse matrix-mautrix-telegram[42282]:     return await _acquire_impl()
Aug 04 11:39:50 synapse matrix-mautrix-telegram[42282]:            ^^^^^^^^^^^^^^^^^^^^^
Aug 04 11:39:50 synapse matrix-mautrix-telegram[42282]:   File "/usr/lib/python3.11/site-packages/asyncpg/pool.py", line 823, in _acquire_impl
Aug 04 11:39:50 synapse matrix-mautrix-telegram[42282]:     proxy = await ch.acquire()  # type: PoolConnectionProxy
Aug 04 11:39:50 synapse matrix-mautrix-telegram[42282]:             ^^^^^^^^^^^^^^^^^^
Aug 04 11:39:50 synapse matrix-mautrix-telegram[42282]:   File "/usr/lib/python3.11/site-packages/asyncpg/pool.py", line 137, in acquire
Aug 04 11:39:50 synapse matrix-mautrix-telegram[42282]:     await self.connect()
Aug 04 11:39:50 synapse matrix-mautrix-telegram[42282]:   File "/usr/lib/python3.11/site-packages/asyncpg/pool.py", line 129, in connect
Aug 04 11:39:50 synapse matrix-mautrix-telegram[42282]:     self._con = await self._pool._get_new_connection()
Aug 04 11:39:50 synapse matrix-mautrix-telegram[42282]:                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Aug 04 11:39:50 synapse matrix-mautrix-telegram[42282]:   File "/usr/lib/python3.11/site-packages/asyncpg/pool.py", line 521, in _get_new_connection
Aug 04 11:39:50 synapse matrix-mautrix-telegram[42282]:     con = await connect_utils._connect_addr(
Aug 04 11:39:50 synapse matrix-mautrix-telegram[42282]:           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Aug 04 11:39:50 synapse matrix-mautrix-telegram[42282]:   File "/usr/lib/python3.11/site-packages/asyncpg/connect_utils.py", line 773, in _connect_addr
Aug 04 11:39:50 synapse matrix-mautrix-telegram[42282]:     return await __connect_addr(params, timeout, True, *args)
Aug 04 11:39:50 synapse matrix-mautrix-telegram[42282]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Aug 04 11:39:50 synapse matrix-mautrix-telegram[42282]:   File "/usr/lib/python3.11/site-packages/asyncpg/connect_utils.py", line 831, in __connect_addr
Aug 04 11:39:50 synapse matrix-mautrix-telegram[42282]:     await compat.wait_for(connected, timeout=timeout)
Aug 04 11:39:50 synapse matrix-mautrix-telegram[42282]:   File "/usr/lib/python3.11/site-packages/asyncpg/compat.py", line 56, in wait_for
Aug 04 11:39:50 synapse matrix-mautrix-telegram[42282]:     return await asyncio.wait_for(fut, timeout)
Aug 04 11:39:50 synapse matrix-mautrix-telegram[42282]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Aug 04 11:39:50 synapse matrix-mautrix-telegram[42282]:   File "/usr/lib/python3.11/asyncio/tasks.py", line 479, in wait_for
Aug 04 11:39:50 synapse matrix-mautrix-telegram[42282]:     return fut.result()
Aug 04 11:39:50 synapse matrix-mautrix-telegram[42282]:            ^^^^^^^^^^^^
Aug 04 11:39:50 synapse matrix-mautrix-telegram[42282]: asyncpg.exceptions.TooManyConnectionsError: remaining connection slots are reserved for non-replication superuser connections
Aug 04 11:39:56 synapse matrix-synapse-reverse-proxy-companion[40066]: 172.18.0.20 - - [04/Aug/2023:15:39:56 +0000] "GET /_matrix/client/v3/sync?timeout=30000&since=s122978_2361400_216_67248_81395_51_20076_50_0_1&filter=0&set_presence=online HTTP/1.0" 200 213 "-" "mautrix-instagram/0.3.0 mautrix-python/0.19.16 aiohttp/3.8.3 Python/3.10.11" "-"
Aug 04 11:39:56 synapse matrix-nginx-proxy[41140]: 172.18.0.22 - - [04/Aug/2023:15:39:56 +0000] "GET /_matrix/client/v3/sync?timeout=30000&since=s122978_2361400_216_67248_81395_51_20076_50_0_1&filter=0&set_presence=online HTTP/1.1" 200 178 "-" "mautrix-instagram/0.3.0 mautrix-python/0.19.16 aiohttp/3.8.3 Python/3.10.11" "-"
Aug 04 11:39:57 synapse sudo[868196]:   dietpi : TTY=pts/0 ; PWD=/opt/matrix/matrix-docker-ansible-deploy ; USER=root ; COMMAND=/usr/bin/journalctl -xe
Aug 04 11:39:57 synapse sudo[868196]: pam_unix(sudo:session): session opened for user root(uid=0) by dietpi(uid=1000)

Additional context Add any other context about the problem here.

altsalt commented 1 year ago

This was discussed in the matrix room last night:

The Ghost of Riccarton: is there some sort of memory leak in current synapse The Ghost of Riccarton: yes there must be The Ghost of Riccarton: im watching my ram usage go up quicker than it ever should doing basically nothing The Ghost of Riccarton: while more homeserver processes are spawned zbrown: Anyone else's synapse using a lot more CPU since the upgrade to synapse 1.89? I haven’t federated any new rooms and I went from <15% cpu to around 50-70% AkDk7: Maybe it that problem: There is a bug in 1.89 with the presence feature (https://github.com/matrix-org/synapse/issues/16057). If you have presence activated this can lead to massive calls from Element to the server. Workaround is to deactivate presence until the bug has been resolved. Iruwen: tl;dr a recent Element update broke the presence feature and everything's drowning in sync requests, both clients and servers. If you have presence enabled, disable that. Iruwen: https://github.com/matrix-org/synapse/issues/16039 Iruwen: https://github.com/matrix-org/synapse/issues/16057

gouthamravee commented 1 year ago

@altsalt Thank you! I stumbled into that solution too, disabled presence after creating this post and so far things seem stable.

That's the solution for now.

gill6151 commented 1 year ago

This was discussed in the matrix room last night:

The Ghost of Riccarton: is there some sort of memory leak in current synapse The Ghost of Riccarton: yes there must be The Ghost of Riccarton: im watching my ram usage go up quicker than it ever should doing basically nothing The Ghost of Riccarton: while more homeserver processes are spawned zbrown: Anyone else's synapse using a lot more CPU since the upgrade to synapse 1.89? I haven’t federated any new rooms and I went from <15% cpu to around 50-70% AkDk7: Maybe it that problem: There is a bug in 1.89 with the presence feature (matrix-org/synapse#16057). If you have presence activated this can lead to massive calls from Element to the server. Workaround is to deactivate presence until the bug has been resolved. Iruwen: tl;dr a recent Element update broke the presence feature and everything's drowning in sync requests, both clients and servers. If you have presence enabled, disable that. Iruwen: matrix-org/synapse#16039 Iruwen: matrix-org/synapse#16057

For the record for anyone reading this in the future, my problem there (i am ghost of riccarton in those logs) has nothing to do with what this issue is about