Open nandlab opened 6 months ago
Is this with loglevel: info
?
Can you try this fix https://github.com/processone/ejabberd/issues/4109#issuecomment-1779209127 ?
Yes, I had loglevel: info
. I have now changed it to loglevel: debug
.
I am closing this as duplicate for now. If this issue still persists after the new commits, I will reopen it.
I set EJABBERD_OPTS=""
in ejabberdctl.cfg
but ejabberd still crashed on startup.
I noticed this only happens sometimes with ejabberdctl restart
. ejabberd starts fine with service ejabberd restart
which internally does ejabberdctl stop
and ejabberdctl start
.
I can reproduce the problem (or at least I think it's the same problem) one of every two or three restarts.
The problem appears when using any binary installer from master branch, 23.10, 23.04, 22.05
When running ejabberdctl restart
, the functions called are:
ejabberd_app:start
ejabberd_hooks:run(ejabberd_started, [])
ejabberd_pkix:ejabberd_started()
fast_tls:clear_cache()
fast_tls:clear_cache_nif()
It is very easy to reproduce:
ejabberdctl restart
The problem does not appear when using the container, or ejabberd compiled from source code with Erlang 26.1 (the one used in the installers). In those cases, ejabberdctl restart
successfully restart ejabberd everytime.
Is this seen with OpenSSL 3, and when no dhfile
is configured for listeners in ejabberd.yml
? if so, this may be fixed by https://github.com/processone/fast_tls/pull/63
The problem is reproducible when using binary installers (23.10, older releases, also master release). Those installers are built by tools/make-binaries
which downloads and compiles openssl 1.1.1w
As you mentioned, fast_tls included in git some fixes. It's now tagged and used by ejabberd, but testing installers that use latest fast_tls throws the same problem.
I guess it's worth trying to update binary installers to OpenSSL 3, but unfortunately I get compilation problems in that case.
@badlop: OpenSSL 1.1.1 branch is now EOL, yes please look for OpenSSL 3.2.x:
The newest ejabberd installers use OpenSSL 3.2.1 (since 1962fc88d62ede7676bf7f91f72094ef3f105839) and fast_tls 1.1.18, which includes the fix https://github.com/processone/fast_tls/commit/da16622da621eb4d74f047e8ac376d701782007b. Those installers can be downloaded from https://github.com/processone/ejabberd/actions/runs/7844499079
However, the problem mentioned in this issue is still present.
Happy New Year!
Environment
Errors from error.log/crash.log
No errors
ejabberd.log
Bug description
After running
sudo ejabberdctl restart
ejabberd has written the log seen above and crashed. The ejabberd server was not available that day. There were no error messages and I could not find any information about what went wrong.Later, after a manual
sudo service ejabberd start
ejabberd started normally and has written the following lines to the log:Why could
sudo ejabberdctl restart
bring the server down? What could make ejabberd stuck/crash when "Building MQTT cache"?