Open etranger7 opened 2 months ago
Update: While the main node is running on Server A as ej1@ej1container, I tried to add Server B to it to form a cluster and ran into these issues:
When I use A FQDN, I get
ej3con | :> ejabberdctl join_cluster ej1@subdomain.domain.com
ej3con |
ej3con | 21:31:47.574 [error] ** System NOT running to use fully qualified hostnames **
ej3con | ** Hostname subdomain.domain.com is illegal **
ej3con |
ej3con | Error: error
ej3con | Error: "This node cannot reach that node."
ej3con | :> FAILURE in command 'join_cluster ej1@subdomain.domain.com' !!! Stopping ejabberd...
When I use an IP instead, I get
ej3con | :> ejabberdctl join_cluster ej1@xxx.xxx.xxx.xx
ej3con |
ej3con | 20:17:39.761 [error] ** System NOT running to use fully qualified hostnames **
ej3con | ** Hostname xxx.xxx.xxx.xx is illegal **
ej3con |
ej3con | Error: error
ej3con | Error: "This node cannot reach that node."
ej3con | :> FAILURE in command 'join_cluster ej1@xxx.xxx.xxx.xx' !!! Stopping ejabberd...
ej3con | [os_mon] memory supervisor port (memsup): Erlang has closed
- ERLANG_NODE_ARG=ej1@subdomain.domain.com
That environment variable is read by the ejabberdctl script, and it is passed to the erl virtual machine as the argument -sname
(or -name
when the value has subdomains with a dot .
). As a result, the erlang virtual machine names itself as ej1@subdomain.domain.com
.
docker exec ej1container ejabberdctl status Failed RPC connection to the node 'ej1@subdomain.domain.com': nodedown
I get that same problem with a similar compose file:
The solution in my case is to add subdomain.domain.com
to /etc/hosts
inside the container. That way ejabberdctl is able to connect correctly to the running node and get the status.
ERLANG_NODE_ARG=ej1@ej1container ejabberdctl join_cluster ej1@subdomain.domain.com System NOT running to use fully qualified hostnames
Right, you used the erlang short node name ej1container, so you cannot later use a long node name like sub.domains
Either use:
ERLANG_NODE_ARG=ej1@ej1container ejabberdctl join_cluster ej1@ej1container
If you use this in different machines, make sure the second one knows where to find ej1container (by adding it to /etc/hosts
for example)
Or use:
ERLANG_NODE_ARG=ej1@ej1container.domain.com ejabberdctl join_cluster ej1@ej1container.domain.com
In that case, make sure erlang can know what does ej1container.domain.com
point to.
Thank you for your reply @badlop . Here is what worked for me to move past the "Failed RPC connection to the node 'ej1@subdomain.domain.com': nodedown" message and get a positive STATUS message. In the docker compose file, I used
services:
ejabberd:
image: ejabberd/ecs:24.07
container_name: ejabberd
hostname: subdomain.domain.com
environment:
- CTL_ON_START=status
- ERLANG_COOKIE=[removed]
- ERLANG_NODE_ARG=ejabberd@subdomain.domain.com
However, when I try to connect to ejabberd@subdomain.domain.com that's on Server A, from Server B, I get
Error: error
Error: "This node cannot reach that node."
When I
docker exec ejabberd bin/ejabberdctl ping ejabberd@subdomain.domain.com
from Server B, I get pang.
When I ping Server A from Server B, I can reach it with no issues.
When I
docker exec -u root ejabberd ping subdomain.domain.com
from server B to Server A, again Server A is reachable.
I feel like I'm missing something here. Again, your help is much appreciated.
Hi @badlop , should I re-submit this issue under the issues of https://github.com/processone/ejabberd/ ? I'm wondering whether that's being more closely monitored and whether the issues with the containers should also be submitted there. Thanks.
This is a problem with that container image, so here seems a good place for the issue.
On the other hand, it may be a problem related to docker and erlang clustering, not only ejabberd, and you may search for related questions outside of ejabberd places.
I'm using this docker image and trying to cluster 2 nodes that are on different servers, therefore 2 different public IPs. Just for testing, I successfully clustered 2 docker containers that are on the same machine.
However, when I try to define a FQDN in ERLANG_NODE_ARG, I get an error that I don't know how to overcome.
This container starts without errors (I'm skipping unrelated lines):
This setup gives me an error
It looks like the container starts normally but when I do
I get
I already pointed the A record of subdomain.domain.com to the public IP of the VPS where this is running.
There was a similar issue https://github.com/processone/docker-ejabberd/issues/106 but I don't see how the FQDN was integrated and what the solution was.
Any help would be much appreciated.