sjacorg / bayanat

Open source data management solution for human rights documentation.
https://bayanat.org/
GNU Affero General Public License v3.0
27 stars 16 forks source link

Out of Memory Kills on celery service in fresh install #41

Closed zhabiba24 closed 1 month ago

zhabiba24 commented 1 month ago

Describe the bug I receive OOM kills after running celery on a fresh install as per the docs. Are there minimum memory requirements to run all bayanat services on the same server? Or wondering if there are some memory leaks and it would be useful to set --max-memory-per-child or --max-tasks-per-child as detailed here.

Issue appears to have gone away when I changed the ExecStart command in the systemd service definition in /etc/systemd/system/bayanat-celery.service to /bayanat/env/bin/celery -A enferno.tasks worker -B i.e. by removing the -autoscale 2,5 flag.

For context I am running NGINX, celery, flask application on a server with 1GB RAM.

To Reproduce Steps to reproduce the behavior:

  1. Run sudo systemctl enable --now bayanat-celery.service on a fresh install
  2. Wait a few minutse
  3. Run sudo systemctl status bayanat-celery.service.

Expected behavior Celery should run without crashing.

Logs

celery[47335]: billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL) Job: 0.
celery[47335]: """
systemd[1]: bayanat-celery.service: A process of this unit has been killed by the OOM killer.
celery[47335]: worker: Warm shutdown (MainProcess)
celery[47335]: [2024-10-05 14:16:35,323: ERROR/MainProcess] Process 'ForkPoolWorker-4' pid:47371 exited with 'signal 9 (SIGKILL)'
systemd[1]: bayanat-celery.service: Failed with result 'oom-kill'.
systemd[1]: bayanat-celery.service: Consumed 17.139s CPU time, 526.1M memory peak, 0B memory swap peak.
sjacgit commented 1 month ago

By removing this line celery will default to number of cpu core which may create a lot more than 5 works. You're advised to use autoscale and choose sensible numbers for your hardware.

It seems in this case you were just ran out of memory as celery alone was using half of your system memory. It'll also depend on what you were doing. It up to you how to customize your setup but you probably want to restrict the number of works to half of your cpu count as a max, and autoscale does a good job in that case.

At the moment a server using bayanat needs around 800-1000 MB when idle (flask+celery+redis+postgres+nginx). You'll probably need 2GB as a minimum so things can run smoothly, or more if you want to utilize celery more (e.g. bulk-import media files).

zhabiba24 commented 3 weeks ago

Thanks, probably not enough memory on the machine then. By the way, I noticed the autoscale args may be the wrong way round - I think first arg is max concurrency, second arg min concurrency so I think it should be --autoscale 5,2 not --autoscale 2,5?