Worker hung or process was killed (Allowed memory size exhausted)

tiltroom commented 7 months ago

Shlink version

shlinkio/shlink:4.0-roadrunner

PHP version

shlinkio/shlink:4.0-roadrunner

How do you serve Shlink

Docker image

Database engine

MariaDB

Database version

mariadb:10.8.3-jammy

Current behavior

When my shlink instance is under load it will often crash and requests will result in "Error 500, internal server error". Looking at the logs it's something along the lines:

2024-04-05T09:10:49+0000 WARN server RoadRunner can't communicate with the worker
{"reason": "worker hung or process was killed", "pid": 112, "internal_event_name": "EventWorkerError", "error": "sync_worker_receive_frame: Network:\n\tgoridge_frame_receive: validation failed on the message sent to STDOUT, see: https://roadrunner.dev/docs/known-issues-stdout-crc/current/en, invalid message: \nFatal error: Allowed memory size of 536870912 bytes exhausted (tried to allocate 20480 bytes) in /etc/shlink/vendor/symfony/cache/Marshaller/DefaultMarshaller.php on line 74\n"}

I get multiple of these with different PIDs, so assume the webserver is routing requests to dead processes. This is running in docker on a VM with 16 cores and 32gb ram. At any given point there are 20+gb of free available ram and plenty of CPU.

Expected behavior

I think this is not supposed to happen.

Minimum steps to reproduce

This is my docker config.

shlink: image: shlinkio/shlink:4.0-roadrunner restart: always environment:

DEFAULT_DOMAIN=****
IS_HTTPS_ENABLED='true'
DB_DRIVER=maria
DB_USER=shlink
DB_PASSWORD=*****
DB_HOST=docker_mariadb_1
TIMEZONE=Europe/Rome
ENABLE_PERIODIC_VISIT_LOCATE='true'
REDIS_SERVERS=tcp://docker_redis_1:6379

acelaya commented 7 months ago

Looks like the 512Mb of memory Shlink reserves are not enough.

I'm almost sure, this is memory reserved per worker, not shared between workers, but I need to verify this.

I should probably look for ways this value can be configured, as it is currently hardcoded. I remember increasing it the last time an error like this was reported some years ago.

If anything, maybe you could try defining a smaller amount of workers via WEB_WORKER_NUM env var. By default it creates one per available core.

acelaya commented 7 months ago

I'm almost sure, this is memory reserved per worker, not shared between workers, but I need to verify this.

I can confirm this is correct. In fact, it would be better to set a higher amount of workers, not a lower one, as then, requests will be spread more evenly, and each worker will be able to consume up to 512Mb of RAM.

Next Shlink release will allow this value to be customized via env vars.

acelaya commented 6 months ago

Shlink 4.1.0 has just been released, which allows the memory limit to be customized via MEMORY_LIMIT env var.

More info in the docs.

shlinkio / shlink