rabbitmq / cluster-operator

RabbitMQ Cluster Kubernetes Operator
https://www.rabbitmq.com/kubernetes/operator/operator-overview.html
Mozilla Public License 2.0
865 stars 270 forks source link

Users and queues not persisted #581

Closed tessierp closed 3 years ago

tessierp commented 3 years ago

Hi,

I managed to get an instance of RabbitMQ running in Kubernetes and everything seems to work fine with the exception of one problem. Whenever I create users or queues, they will remain there until I reboot and everything is gone. It seems the configuration is not persisted. What am I missing?

This is my instance configuration :

apiVersion: rabbitmq.com/v1beta1 kind: RabbitmqCluster metadata: name: rabbitmq namespace: cpiir spec: service: type: NodePort persistence: storageClassName: fast storage: 2Gi override: service: spec: ports:

mkuratczyk commented 3 years ago

How do you create users and queues? How do you reboot?

tessierp commented 3 years ago

Hi,

First I used this procedure https://www.rabbitmq.com/kubernetes/operator/using-operator.html#find to find the admin user and password. I logged in and created the users I needed with the queue. I later rebooted the Ubuntu Server VM and when I restarted I noticed that my RabbitMQ Instance's config was gone. My guess is, the data folder must be wiped and not persisted.

mkuratczyk commented 3 years ago

What is that ubuntu server? Is it a Kubernetes node? Is the whole Kubernetes cluster on it? And what is the definition of the fast storageClass?

RabbitMQ stores such data in Mnesia (a database), which has its files on the persistent volume that we mount to the pod. Please check whether you get the same persistent volume attached after a reboot. This sounds more like an issue with how you run Kubernetes than with the Operator - we just request a volume and store data on it. If the data get lost, it's most likely an issue with your Kubernetes storage layer.

tessierp commented 3 years ago

We are just running a development server, this is not a cluster that uses kubeadm. We are using minikube to spawn a single node and create what we need on that node and we are running it with driver=none so no docker involved. The part I think I am missing is the definition of the storage class. I was using bitnami's implementation before this one and made use of volume mounts linked to volume claim templates but that doesn't work so what is stored in /etc/rabbitmq and /var/lib/rabbitmq is lost every time I stop and restart minikube.

mkuratczyk commented 3 years ago

Well, the Operator doesn't work that way either - again, we request a persistent volume and its Kubernetes' job to make it persistent. Just look at your deployment after RabbitMQ started - you'll see a StatefulSet that references a PVC that references a PV. My guess is that after a restart, you either get a completely new deployment or you get a new PV.

I don't use minikube but here is an example of a similar scenario with kind:

# deploy a Kubernetes cluster

$ kind create cluster
Creating cluster "kind" ...
 βœ“ Ensuring node image (kindest/node:v1.20.2) πŸ–Ό
 βœ“ Preparing nodes πŸ“¦
 βœ“ Writing configuration πŸ“œ
 βœ“ Starting control-plane πŸ•ΉοΈ
 βœ“ Installing CNI πŸ”Œ
 βœ“ Installing StorageClass πŸ’Ύ
Set kubectl context to "kind-kind"
You can now use your cluster with:

kubectl cluster-info --context kind-kind

Have a question, bug, or feature request? Let us know! https://kind.sigs.k8s.io/#community πŸ™‚

# deploy RabbitMQ operator

$ kubectl rabbitmq install-cluster-operator
namespace/rabbitmq-system created
customresourcedefinition.apiextensions.k8s.io/rabbitmqclusters.rabbitmq.com created
serviceaccount/rabbitmq-cluster-operator created
role.rbac.authorization.k8s.io/rabbitmq-cluster-leader-election-role created
clusterrole.rbac.authorization.k8s.io/rabbitmq-cluster-operator-role created
rolebinding.rbac.authorization.k8s.io/rabbitmq-cluster-leader-election-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/rabbitmq-cluster-operator-rolebinding created
deployment.apps/rabbitmq-cluster-operator created

# deploy a RabbitMQ cluster

$ kubectl rabbitmq create foo
rabbitmqcluster.rabbitmq.com/foo created

# create a user

$ kubectl exec -ti foo-server-0 -- rabbitmqadmin declare user name=foo password=bar tags=administrator
user declared

# restart my Kubernetes cluster

$ docker ps
CONTAINER ID   IMAGE                  COMMAND                  CREATED         STATUS         PORTS                       NAMES
9b1a7cb8f0ea   kindest/node:v1.20.2   "/usr/local/bin/entr…"   9 minutes ago   Up 9 minutes   127.0.0.1:60921->6443/tcp   kind-control-plane

$ docker restart 9b1a7cb8f0ea
9b1a7cb8f0ea

# wait a minute or so - kind needs to start first and then RabbitMQ needs to start

# check that the user `foo` is still there

$ kubectl exec -ti foo-server-0 -- rabbitmqadmin list users
+----------------------------------+--------------------------------+--------------------------------------------------+---------------+
|               name               |       hashing_algorithm        |                  password_hash                   |     tags      |
+----------------------------------+--------------------------------+--------------------------------------------------+---------------+
| Gs06h87f1wfe-cotQ9ZJaBsM-6P9txga | rabbit_password_hashing_sha256 | 3CDZreZ5dGlndpfRa+gdTy6PBI+Ern6AMMnGqOfNAZWedY5Q | administrator |
| foo                              | rabbit_password_hashing_sha256 | J1aRuxhgXWKxqzMIO7vvLL+4b6hqQjUXogIoRfDXYkpu7aEd | administrator |
+----------------------------------+--------------------------------+--------------------------------------------------+---------------+
tessierp commented 3 years ago

You are correct, the error here is I am not assigning a persistent storage to the instance. Whatever I do in the instance will be lost since there is not external data volume to mount from. I'll have to investigate how to get this done.

tessierp commented 3 years ago

Hi,

Sorry to reopen this. Just wondering if there is an example somewhere of what we need to do in order to define a storage as per what is in the document

apiVersion: rabbitmq.com/v1beta1 kind: RabbitmqCluster metadata: name: rabbitmqcluster-sample spec: persistence: storageClassName: fast storage: 20Gi

mkuratczyk commented 3 years ago

I don't understand your question. Your example shows how to tell RabbitMQ Operator to provision a 20GB persistent disk of storageClass "fast" and use it for persistence (except YAML formatting is wrong). If you are asking how to actually set up your Kubernetes to offer "fast" storageClass then I'm afraid this is not the right forum. We do most of our testing by simply using the default storageClass provide by a given Kubernetes offering (GKE, TKG, EKS, etc).

tessierp commented 3 years ago

Hi, That in not my example but what comes from the online documentation. And I guess what I was asking is if there was a more elaborate example somewhere but it doesn't seem to be the case. :)

mkuratczyk commented 3 years ago

Error from server (BadRequest): container "rabbitmq" in pod "rabbitmq-server-0" is waiting to start: CreateContainerError I think it is missing the image? Not sure.

Please provide the YAML you apply, commands you execute, their output and other applicable logs. Otherwise we can't know whether that's a bad image or something else. kubectl describe pod rabbitmq-server-0 should tell you why the pod can't start.

tessierp commented 3 years ago

Hi,

Sorry for the delay. I have been very busy.

Here are my two yaml file

Persistent Volume :

---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: rabbitmq-pv
  namespace: dev-env
  labels:
    type: local
spec:
  storageClassName: rabbitmq-storage-class
  capacity:
    storage: 3Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/mnt/kubernetes/rabbitmq"

Statefulset :

---
apiVersion: rabbitmq.com/v1beta1
kind: RabbitmqCluster
metadata:
  namespace: dev-env
  name: rabbitmq
spec:
  replicas: 1
  service:
    type: NodePort  
  override:
    service:
      spec:
        ports:
          - name: http
            protocol: TCP
            port: 15672
            targetPort: 15672
            nodePort: 31672
          - name: amqp
            protocol: TCP
            port: 5672
            targetPort: 5672
            nodePort: 30672
    statefulSet:
      spec:
        template:
          spec:
            containers:
              - name: rabbitmq
                volumeMounts:
                  - name: rabbitmq-storage
                    mountPath: /var/lib/rabbitmq
                  # - name: config-volume
                  #   mountPath: /etc/rabbitmq
        volumeClaimTemplates:
        - metadata:
            name: rabbitmq-storage
          spec:
            storageClassName: rabbitmq-storage-class
            accessModes: ["ReadWriteOnce"]
            resources:
              requests:
                storage: 3Gi
            selector:
              matchLabels:
                type: local

And I get the following error message for the statefulset :

Warning FailedCreate 2s (x9 over 3s) statefulset-controller create Pod rabbitmq-server-0 in StatefulSet rabbitmq-server failed error: Pod "rabbitmq-server-0" is invalid: [spec.containers[0].volumeMounts[1].name: Not f und: "persistence", spec.initContainers[0].volumeMounts[4].name: Not found: "persistence"]

So yes this kind of persistence work and I have it working for MongoDB and PostgreSQL. So given the error message, I looked in the cluster-operator.yml file, I do see a persistence definition there but not sure what to configure for it or how it would fit in my statefulset. Can you advise?

mkuratczyk commented 3 years ago

Volume called persistence is attached automatically in most cases. However, if you specify spec.override.stateful Set.spec.volumeClaimTemplates, you need to (re)define it because volumeClaimTemplates array is not additive - when you override it, the original value (which included persistence) is no longer there. Check out the multiple-disk example.

Lastin commented 3 years ago

Hi everyone. Just wondering if you got it to work in the end? I tried the above approach and keep getting error saying that the selector cannot be used for dynamic PVs. Is there alternative way of re-bounding PV to PVC or selectors are the most logical one?

tessierp commented 3 years ago

Unfortunately I had to move on and went back to Docker with a package I knew worked. Hopefully someone else got it to work and can help you.