processone / ejabberd

Robust, Ubiquitous and Massively Scalable Messaging Platform (XMPP, MQTT, SIP Server)
https://www.process-one.net/en/ejabberd/
Other
6.01k stars 1.5k forks source link

too many child processes from `eimp` #4100

Closed cfrepak closed 9 months ago

cfrepak commented 9 months ago

Environment

Bug description

A few days ago we upgraded our Jabber LXC instance from Debian 11 to 12 and since then we have noticed a high number of child processes from eimp. under Debian 11 there were about ~70 child processes out of a total of 100. under Debian 12 there are 255 child processes. Is this the way it is supposed to be and can it be stopped?

As reference I set up a new LXC with Debian 12 and installed ejabberd, without configuration, and have the same issue.

Here is the status output of the systemd process from Debian 11. The Debian 12 output looks similar, but way longer...

● ejabberd.service - robust, scalable and extensible realtime platform (XMPP server + MQTT broker + SIP service)
     Loaded: loaded (/lib/systemd/system/ejabberd.service; enabled; vendor preset: enabled)
     Active: active (running) since Mon 2023-10-16 10:49:30 CEST; 30min ago
       Docs: https://www.process-one.net/en/ejabberd/docs/
   Main PID: 122 (sh)
      Tasks: 214 (limit: 154378)
     Memory: 137.7M
        CPU: 8.195s
     CGroup: /system.slice/ejabberd.service
             ├─122 /bin/sh -c /usr/sbin/ejabberdctl foreground
             ├─146 /bin/bash /usr/sbin/ejabberdctl foreground
             ├─158 /usr/lib/erlang/erts-11.1.8/bin/beam.smp -K true -P 250000 -- -root /usr/lib/erlang -progname erl -- -home /var/lib/ejabberd -- -sname ejabberd@localhost -mnesia dir "/var/lib/ejabberd" -ejabberd log_rotate_count 0 -s ejabberd -noshell -noinput
             ├─182 erl_child_setup 65536
             ├─462 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─463 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─464 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─465 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─466 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─468 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─469 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─470 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─471 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─472 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─473 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─474 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─475 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─476 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─478 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─479 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─480 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─481 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─482 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─483 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─484 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─485 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─486 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─487 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─488 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─490 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─491 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─492 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─493 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─494 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─495 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─496 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─497 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─498 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─499 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─500 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─501 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─502 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─503 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─504 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─505 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─507 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─508 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─509 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─510 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─511 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─512 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─513 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─514 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─515 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─517 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─518 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─519 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─520 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─521 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─522 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─523 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─524 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─525 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─526 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─527 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─528 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─530 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─531 /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp
             ├─989 inet_gethost 4
             ├─990 inet_gethost 4
             └─991 /usr/lib/erlang/lib/os_mon-2.6.1/priv/bin/memsup

[log messages...]
licaon-kter commented 9 months ago

Those are used by mod_avatar I guess, is by any chance the number of cores = number of eimp processes?

Do you have thumbnail: true in mod_http_upload ? Try to set it to false.

Did you set special mod_avatar conversion rules?

cfrepak commented 9 months ago

We are using a basic Setup without any special configuration or modules, except LDAP. We are just using XMPP. And I tried a plain Setup on a fresh Debian 12 Container with the same result.

we are not using mod_http_upload.

licaon-kter commented 9 months ago

Do put a cleaned up ejabberd.yml on https://gist.github.com and attach the link here

cfrepak commented 9 months ago

I put in the yml from the fresh install. The file is not modified...

https://gist.github.com/frashman123/eedda75008aa7218db275d2a8c654a49

prefiks commented 9 months ago

I believe eimp start one process per core, how much cores this machine have? (if you run ejabberd shell with ejabberdctl debug you can check number that we use by executing erlang:system_info(logical_processors))

cfrepak commented 9 months ago

Ejabberd runs on a dedicated LXC with 1Core. The host machine runs on a 2x16C (32T) AMD EPYC 7282 (total of 64 Threads). I just migrated to another host machine with just 12/24 cores/threads and it seems you are right. Now I am down to 24 processes.

This is clearly a bug. Any chance for a workaround? Can I manually set the max count of child processes?

licaon-kter commented 9 months ago

Ejabberd runs on a dedicated LXC with 1Core.

...that exposes the real number of cores to the VM?

cfrepak commented 9 months ago

Well, yes. As far as I know this is normal.

licaon-kter commented 9 months ago

So an app might start 24 threads, since it is lied 24 cores exist, but have only one core available? Does not sound ok to me...

cfrepak commented 9 months ago

We are running a proxmox cluster on 3 nodes. LXC containers are not a KVM. They are more like a Docker container, but with a base OS like Debian. They share the kernel of the host and devices like PCI cards or storage - if configured. So yes, I type cat /proc/cpuinfo I can see the host ressources. I don't think this is okay, but as far as I know this is normal.

The question is why "eimp" spawn child processes, depending on how many cores are available

prefiks commented 9 months ago

So we don't have option to set that manually, i also don't see a way to tell erlang system to use different value that what it detects from os. You could probably try use lxcfs to make system report values that take into consideration limits set in lxc.

cfrepak commented 9 months ago

hmm okay, that's fine. I will enable lxc-fs for our internal services. But I think you should consider to change this behavior, because in my opinion it makes no sense to create as many processes of the image manipulator as there are cores available. Or at least give the option to set the max amount of child processes.

Thank you for your help.

newrokor commented 3 months ago

It seems to me that I am suffering of the same problem: Way too much eimp subprocesses for a small family level XMPP server. It resides on a small KVM machine (not 100% sure, would have to re-check) on my service provider, but /proc/cpuinfo delivers one core. However, the a.m. debug proposal seems not to produce a reasonable result: two dots sit there until I stop the shell. The conditions here are:

If I start the ejabberd service, at a certain point the OOM killer comes into play and makes a mess out of my small server. So, sadly, unusable right now. Especially sadly, since it worked very stable for years.

What does erl_child_setup 65536 mean? Would this be a measure to limit the subprocesses?

badlop commented 3 months ago

Deactivation of mod_avatar and mod_http_upload has no influence

During erlang VM startup, it starts all the erlang applications mentioned in this file, including eimp: https://github.com/processone/ejabberd/blob/426e33d3a67302518adb9b778dca448ff71166c9/src/ejabberd.app.src.script#L38-L41

If you have no plan to use this library at all (you already disabled mod_avatar and mod_http_upload), then you can try to remove the library name in that file.

cfrepak commented 3 months ago

In the end, I simply removed the "executable" flag from the binary. It's still a bug from erlang/ejabber.... the solution above to set the correct lxcfs settings was already done - I had forgotten about it at that point. It's normal for LXC to see some host parameters (like CPU count) since it uses the host kernel.

chmod -x /usr/lib/erlang/lib/p1_eimp-1.0.22/priv/bin/eimp