processone / docker-ejabberd

Set of ejabberd Docker images
94 stars 77 forks source link

Feature: Let ejabberd startup fail if either CTL_ON_CREATE or CTL_ON_START fail #97

Closed sando38 closed 1 year ago

sando38 commented 1 year ago

Some configurations rely on the success of the two configurations CTL_ON_CREATE or CTL_ON_START.

Obviously some ejabberdctl cmds like register or status do not harm, if they aren't successful, but still a configuration may rely on them to be successful.

Others like join_cluster may be more problematic, as a configuration may rely on a successful cluster join, but ejabberd would only report an error, if it fails and continue being up. In this case a startup failure due to CTL_ON_CREATE or CTL_ON_START failure would at least trigger the attention on logging tools in regard of continuous restarts or whatever.

badlop commented 1 year ago

Ok, I wrote a small patch that seems to do the trick, at least in my local testing:

$ CTL_ON_START="list_cluster ; register admin4 localhost asd ; status" `pwd`/ejabberdctl foreground
...
:> ejabberdctl list_cluster
ejabberd@localhost
:> ejabberdctl register admin4 localhost asd
Error: conflict: User admin4@localhost already registered
:> FAILURE in command 'register admin4 localhost asd' !!! Stopping ejabberd...
[os_mon] memory supervisor port (memsup): Erlang has closed

Now it should be checked in real circumstances, in a container.

diff --git a/.github/container/ejabberdctl.template b/.github/container/ejabberdctl.template
index 1f4d902ec..84969f7ee 100755
--- a/.github/container/ejabberdctl.template
+++ b/.github/container/ejabberdctl.template
@@ -283,6 +283,12 @@ post_waiter_loop()
     TAIL=${LIST#* ; }
     echo ":> ejabberdctl $HEAD"
     $0 $HEAD
+    ctlstatus=$?
+    if [ $ctlstatus -ne 0 ] ; then
+        echo ":> FAILURE in command '$HEAD' !!! Stopping ejabberd..."
+        $0 halt > /dev/null
+        exit $ctlstatus
+    fi
     [ "$HEAD" = "$TAIL" ] || post_waiter_loop $TAIL
 }

diff --git a/src/ejabberd_admin.erl b/src/ejabberd_admin.erl
index 443d8164e..c84648532 100644
--- a/src/ejabberd_admin.erl
+++ b/src/ejabberd_admin.erl
@@ -116,6 +116,10 @@ get_commands_spec() ->
            desc = "Stop ejabberd gracefully",
            module = ?MODULE, function = stop,
            args = [], result = {res, rescode}},
+     #ejabberd_commands{name = halt, tags = [server],
+           desc = "Halt ejabberd with status code 1",
+           module = ejabberd, function = halt,
+           args = [], result = {res, rescode}},
      #ejabberd_commands{name = restart, tags = [server],
            desc = "Restart ejabberd gracefully",
            module = ?MODULE, function = restart,

The patch adds a new command 'halt' that should return status code 1, I hope that helps to raise awareness that something went wrong. Alternatively, instead of

$0 halt > /dev/null

it could simply say

$0 stop

and the new halt command wouldn't be needed.

sando38 commented 1 year ago

yes, that works well!

badlop commented 1 year ago

Also merged in ejabberd upstream repository for the ejabberd container image: https://github.com/processone/ejabberd/commit/ffbcf19156b5808e0640193ee8439a9a01354c87