Closed 519790441 closed 1 year ago
I am not fluent enough to get some clue from the crash dump.
And you already tried most of the ideas that I would give you. I'll give some other ideas, let's hope one of them is suitable for your user case and gives positive results.
Compile your custom ejabberd source code with the same erlang version that was included in the installer (seems to be Erlang/OTP 21.3).
Then in your stable ejabberd 19.05 server, copy the files that you customized (probably a few *.beam files) to overwrite the installed ones.
That way, the server is running using the erlang virtual machine that is known to be stable.
I started running the official binary version of ejabberd 19.05 in the official docker container of Centos 7.4.1708.
You can try the ecs
container image: https://hub.docker.com/r/ejabberd/ecs/tags?page=1&name=19.05
If that is stable, then you can regenerate the image with your custom ejabberd source code, just changing in Dockerfile what ejabberd source code to use, and that should be stable too.
The docker-ejabberd repository doesn't have the 19.05 tag, but the image quite probably was built based in this commit: https://github.com/processone/docker-ejabberd/commit/97dc39d9be16b9ba3617a23d1293d82235ca0af9
Or you can checkout to master branch, and try to build with:
./build.sh 19.05
That way, if the original container image is stable, you are now generating a custom container image using the exact same method (just changed the source code)
@badlop I switched to OTP 24.3 and ejabberd22.10 (both source compiled) last week, while no longer adjusting the time zone within the docker container, so far I have not found random crashes. Before making this adjustment, it had crashed at least once a day or two. I'll report the rest later. The original docker containers using the ejabberd19.05 official installation package did not adjust the time zone, and several containers with random crash issues reported earlier did adjust the time zone.
Thanks, it seems solved then. Please comment back if you still have issues.
Environment
Erlang/OTP 25 [erts-13.2] [source] [64-bit] [smp:2:2] [ds:2:2:10] [async-threads:1] [jit:ns]
Configuration (only if needed): grep -Ev '^$|^\s*#' ejabberd.yml
Errors from error.log/crash.log
File crash.log not generated. There are only some 'ejabberd_acme:issue_request/7:246 Failed to request certificate for XXXXXX' outputs in file error.log.
Bug description
Please, give us a precise description (what does not work, what is expected, etc.)
I have been struggling with this issue for the past few months.
A few years ago, I started running the official binary version of ejabberd 19.05 in the official docker container of Centos 7.4.1708. Using ejabberdctl foreground, it has been working well, with around 2000 ejabberd users on average.
Recently, due to business needs, it was necessary to change the source code in ejabberd 19.05. So I compiled Erlang OTP 21 and ejabberd 19.05 source code, and to ensure stability, I conducted testing without modifying the ejabberd-19.05 source code.
At this point, the problem of random crashes was discovered, and a crash dump was generated for the first time, starting with the following content.
Several random crashes occurred later, without generating a crash dump file or related logs. So, I switched to source code compilation for Erlang-OTP 25 and Ejabberd 23.04, which also had the same issue (no crash dump file generated).
Today, I switched to the official binary installation package of ejabberd 23.04, which also had the same issue (no crash dump file generated), and the problems are occurring more frequently. dmesg -T shows that there have been no OOM killer issues.
Can you indicate the next direction for investigation? As long as I switch back to the official binary version of ejabberd19.05, everything works fine. But business needs, so I have to choose to modify the source code of ejabberd.