Closed tiltroom closed 7 months ago
Looks like the 512Mb of memory Shlink reserves are not enough.
I'm almost sure, this is memory reserved per worker, not shared between workers, but I need to verify this.
I should probably look for ways this value can be configured, as it is currently hardcoded. I remember increasing it the last time an error like this was reported some years ago.
If anything, maybe you could try defining a smaller amount of workers via WEB_WORKER_NUM env var. By default it creates one per available core.
I'm almost sure, this is memory reserved per worker, not shared between workers, but I need to verify this.
I can confirm this is correct. In fact, it would be better to set a higher amount of workers, not a lower one, as then, requests will be spread more evenly, and each worker will be able to consume up to 512Mb of RAM.
Next Shlink release will allow this value to be customized via env vars.
Shlink version
shlinkio/shlink:4.0-roadrunner
PHP version
shlinkio/shlink:4.0-roadrunner
How do you serve Shlink
Docker image
Database engine
MariaDB
Database version
mariadb:10.8.3-jammy
Current behavior
When my shlink instance is under load it will often crash and requests will result in "Error 500, internal server error". Looking at the logs it's something along the lines:
2024-04-05T09:10:49+0000 WARN server RoadRunner can't communicate with the worker
{"reason": "worker hung or process was killed", "pid": 112, "internal_event_name": "EventWorkerError", "error": "sync_worker_receive_frame: Network:\n\tgoridge_frame_receive: validation failed on the message sent to STDOUT, see: https://roadrunner.dev/docs/known-issues-stdout-crc/current/en, invalid message: \nFatal error: Allowed memory size of 536870912 bytes exhausted (tried to allocate 20480 bytes) in /etc/shlink/vendor/symfony/cache/Marshaller/DefaultMarshaller.php on line 74\n"}
I get multiple of these with different PIDs, so assume the webserver is routing requests to dead processes. This is running in docker on a VM with 16 cores and 32gb ram. At any given point there are 20+gb of free available ram and plenty of CPU.
Expected behavior
I think this is not supposed to happen.
Minimum steps to reproduce
This is my docker config.
shlink: image: shlinkio/shlink:4.0-roadrunner restart: always environment: