processone / docker-ejabberd

Set of ejabberd Docker images
94 stars 77 forks source link

Accessing the ejabberd server using GCP Kubernetes deployment #101

Closed NavinVinayagam closed 1 year ago

NavinVinayagam commented 1 year ago

As a first step in using ejabberd cluster in GCP, I tried to change the node name using the environment variable "ERLANG_NODE_ARG=ejabberd@main" as mentioned in the readme file.

But I am not able to access the ejabberd server in the service. I tried to check the status using ejabberdctl, the start command returns node already running message while the status command return node down message. enter image description here

I want to create an ejabberd cluster. Below is my deployment file

apiVersion: apps/v1
kind: Deployment
metadata:
  name: main
spec:
  replicas: 1
  selector:
    matchLabels:
      app: main
  template:
    metadata:
      labels:
        app: main
    spec:
      containers:
        - name: main
          image: ejabberd/ecs
          env:
            - name: ERLANG_NODE_ARG
              value: ejabberd@main
          #   - name: ERLANG_COOKIE
          #     value: dummycookie123
            # - name: CTL_ON_CREATE
            #   value: "register admin localhost asd"
          ports:
            - containerPort: 5222
            - containerPort: 5269
            - containerPort: 5280
            - containerPort: 5443

I am trying to access the above deployment by defining the service. I am able to access the service if I remove the environment variable added to change the nodename, but it fails when I include the variable in the YAML file.

I checked the ejabberd.log file and error.log file inside the container using cloudshell, there is no entry in error.log and all comments in ejabberd.log matches the log of the ejabberd tested in the local machine. I couldn't find why this fails in GCP. Can you help me identify the cause for this issue and suggest the guidelines regarding the ejabberd deployment in the GCP cluster?

badlop commented 1 year ago

Maybe there is some error in the documentation. Let's try to solve your problem so we learn what's the problem, and alter fix the documentation.

I tried this:

$ ERLANG_NODE_ARG=aa@main ./bin/ejabberdctl start

$ ERLANG_NODE_ARG=aa@main ./bin/ejabberdctl status
Failed RPC connection to the node aa@main: nodedown

$ epmd -names
epmd: up and running on port 4369 with data:
name aa at port 35265

Then I edit the system file /ets/hosts to add this line:

127.0.0.1       main

Now this works:

$ ERLANG_NODE_ARG=aa@main ./bin/ejabberdctl status
The node aa@main is started with status: started
ejabberd 23.04.34 is running in that node

value: ejabberd@main

In your case, instead of main, you can try to set the machine name, or localhost.

When you solve the problem, please comment how you solved it. Thanks!

NavinVinayagam commented 1 year ago

Thanks for the reply, It seems the appropriate format for the node name is "name@(host_name/machine_name/contiainer_name)"

Then I edit the system file /ets/hosts to add this line:

127.0.0.1 main

I followed your reply and edited "hosts" file and mapped dev.example.com for 127.0.0.1, then I started the ejabberd service in my Ubuntu WSL with the below set of node names.

In the case of the example docker-compose file, the container name is defined as "main" and "replica" hence the corresponding node names "ejabberd@main" and "ejabberd@replica" worked. In GCP Kubernetes, the node name doesn't match with the pod name and hence failed.

In your case, instead of main, you can try to set the machine name, or localhost.

Then I checked in GCP Kubernetes by using the environment variable as "ERLANG_NODE_ARG=ejabberdmain@localhost", Now the ejabberd was accessible in the service with the provided node name. (I tried with a single pod in the )

Then I added 2 deployments with single pods and modified node names and tried to connect them with the initial commands mentioned in the Clustering example , the 2 pods are running, but there is no s2s connection.

Now I followed the example exactly as in the readme and added 2 containers in the GCP Kubernetes deployment, but it doesn't work as well.

I trying this in GCP Kubernetes so that my ejabberd server nodes can autoscale inside the Kubernetes. Can you provide some guidance here?