rabbitmq / rabbitmq-autocluster

RabbitMQ peer discovery and cluster formation plugin, supports RabbitMQ 3.6.x
BSD 3-Clause "New" or "Revised" License
242 stars 54 forks source link

[error] <0.272.0> Failed to fetch a list of nodes from Kubernetes API: 404 #75

Closed corrtia closed 3 years ago

corrtia commented 3 years ago

Describe the bug: I've used the configuration in minikube,and I have this problem:

2021-02-18 08:14:52.827 [error] <0.272.0> Failed to fetch a list of nodes from Kubernetes API: 404
2021-02-18 08:14:52.831 [error] <0.272.0> Peer discovery returned an error: "404". Will retry after a delay of 500 ms, 9 retries left...
2021-02-18 08:14:53.337 [error] <0.272.0> Failed to fetch a list of nodes from Kubernetes API: 404
2021-02-18 08:14:53.340 [error] <0.272.0> Peer discovery returned an error: "404". Will retry after a delay of 500 ms, 8 retries left...
2021-02-18 08:14:53.847 [error] <0.272.0> Failed to fetch a list of nodes from Kubernetes API: 404
2021-02-18 08:14:53.851 [error] <0.272.0> Peer discovery returned an error: "404". Will retry after a delay of 500 ms, 7 retries left...
2021-02-18 08:14:54.357 [error] <0.272.0> Failed to fetch a list of nodes from Kubernetes API: 404
2021-02-18 08:14:54.360 [error] <0.272.0> Peer discovery returned an error: "404". Will retry after a delay of 500 ms, 6 retries left...
2021-02-18 08:14:54.867 [error] <0.272.0> Failed to fetch a list of nodes from Kubernetes API: 404
2021-02-18 08:14:54.870 [error] <0.272.0> Peer discovery returned an error: "404". Will retry after a delay of 500 ms, 5 retries left...
2021-02-18 08:14:55.375 [error] <0.272.0> Failed to fetch a list of nodes from Kubernetes API: 404
2021-02-18 08:14:55.378 [error] <0.272.0> Peer discovery returned an error: "404". Will retry after a delay of 500 ms, 4 retries left...
2021-02-18 08:14:55.885 [error] <0.272.0> Failed to fetch a list of nodes from Kubernetes API: 404
2021-02-18 08:14:55.888 [error] <0.272.0> Peer discovery returned an error: "404". Will retry after a delay of 500 ms, 3 retries left...
2021-02-18 08:14:56.395 [error] <0.272.0> Failed to fetch a list of nodes from Kubernetes API: 404
2021-02-18 08:14:56.398 [error] <0.272.0> Peer discovery returned an error: "404". Will retry after a delay of 500 ms, 2 retries left...
2021-02-18 08:14:56.905 [error] <0.272.0> Failed to fetch a list of nodes from Kubernetes API: 404
2021-02-18 08:14:56.907 [error] <0.272.0> Peer discovery returned an error: "404". Will retry after a delay of 500 ms, 1 retries left...
2021-02-18 08:14:57.414 [error] <0.272.0> Failed to fetch a list of nodes from Kubernetes API: 404
2021-02-18 08:14:57.416 [error] <0.272.0> Peer discovery returned an error: "404". Will retry after a delay of 500 ms, 0 retries left...

BOOT FAILED
===========
Exception during startup:

    rabbit_boot_steps:run_boot_steps/1 line 20
    rabbit_boot_steps:'-run_boot_steps/1-lc$^0/1-0-'/1 line 19
    rabbit_boot_steps:run_step/2 line 46
    rabbit_boot_steps:'-run_step/2-lc$^0/1-0-'/2 line 41
    rabbit_mnesia:init/0 line 76
    rabbit_mnesia:init_with_lock/3 line 111
    rabbit_mnesia:run_peer_discovery_with_retries/2 line 145
    rabbit_mnesia:run_peer_discovery_with_retries/2 line 138
error:{badmatch,ok}

2021-02-18 08:14:57.920 [info] <0.44.0> Application mnesia exited with reason: stopped
2021-02-18 08:14:57.921 [error] <0.272.0> 
2021-02-18 08:14:57.921 [info] <0.44.0> Application mnesia exited with reason: stopped
2021-02-18 08:14:57.921 [error] <0.272.0> BOOT FAILED
2021-02-18 08:14:57.921 [error] <0.272.0> ===========
2021-02-18 08:14:57.921 [error] <0.272.0> Exception during startup:
2021-02-18 08:14:57.922 [error] <0.272.0> 
2021-02-18 08:14:57.922 [error] <0.272.0>     rabbit_boot_steps:run_boot_steps/1 line 20
2021-02-18 08:14:57.922 [error] <0.272.0>     rabbit_boot_steps:'-run_boot_steps/1-lc$^0/1-0-'/1 line 19
2021-02-18 08:14:57.922 [error] <0.272.0>     rabbit_boot_steps:run_step/2 line 46
2021-02-18 08:14:57.922 [error] <0.272.0>     rabbit_boot_steps:'-run_step/2-lc$^0/1-0-'/2 line 41
2021-02-18 08:14:57.923 [error] <0.272.0>     rabbit_mnesia:init/0 line 76
2021-02-18 08:14:57.923 [error] <0.272.0>     rabbit_mnesia:init_with_lock/3 line 111
2021-02-18 08:14:57.923 [error] <0.272.0>     rabbit_mnesia:run_peer_discovery_with_retries/2 line 145
2021-02-18 08:14:57.923 [error] <0.272.0>     rabbit_mnesia:run_peer_discovery_with_retries/2 line 138
2021-02-18 08:14:57.923 [error] <0.272.0> error:{badmatch,ok}
2021-02-18 08:14:57.923 [error] <0.272.0> 
2021-02-18 08:14:58.925 [info] <0.271.0> [{initial_call,{application_master,init,['Argument__1','Argument__2','Argument__3','Argument__4']}},{pid,<0.271.0>},{registered_name,[]},{error_info,{exit,{{badmatch,ok},{rabbit,start,[normal,[]]}},[{application_master,init,4,[{file,"application_master.erl"},{line,138}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,226}]}]}},{ancestors,[<0.270.0>]},{message_queue_len,1},{messages,[{'EXIT',<0.272.0>,normal}]},{links,[<0.270.0>,<0.44.0>]},{dictionary,[]},{trap_exit,true},{status,running},{heap_size,376},{stack_size,28},{reductions,354}], []
2021-02-18 08:14:58.925 [error] <0.271.0> CRASH REPORT Process <0.271.0> with 0 neighbours exited with reason: {{badmatch,ok},{rabbit,start,[normal,[]]}} in application_master:init/4 line 138
2021-02-18 08:14:58.926 [info] <0.44.0> Application rabbit exited with reason: {{badmatch,ok},{rabbit,start,[normal,[]]}}
2021-02-18 08:14:58.926 [info] <0.44.0> Application rabbit exited with reason: {{badmatch,ok},{rabbit,start,[normal,[]]}}
{"Kernel pid terminated",application_controller,"{application_start_failure,rabbit,{{badmatch,ok},{rabbit,start,[normal,[]]}}}"}
Kernel pid terminated (application_controller) ({application_start_failure,rabbit,{{badmatch,ok},{rabbit,start,[normal,[]]}}})

Crash dump is being written to: /var/log/rabbitmq/erl_crash.dump...done
Gsantomaggio commented 3 years ago

Hi @CORRTAIN, Can you please provide more context? Are you using the last Operator version? Can you post the definition yaml file are you using to deploy RabbitMQ cluster?

And are you sure you are using the right tools? this is rabbitmq-autocluster, please have a look on https://github.com/rabbitmq/cluster-operator

corrtia commented 3 years ago

configmap.yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: rabbitmq-config
data:
  enabled_plugins: |
      [rabbitmq_management,rabbitmq_peer_discovery_k8s].
  rabbitmq.conf: |
      cluster_formation.peer_discovery_backend  = rabbit_peer_discovery_k8s
      cluster_formation.k8s.host = kubernetes.default.svc.cluster.local
      cluster_formation.k8s.address_type = hostname
      cluster_formation.node_cleanup.interval = 30
      cluster_formation.node_cleanup.only_log_warning = true
      cluster_partition_handling = autoheal
      queue_master_locator=min-masters
      loopback_users.guest = false

rbac.yaml

---
# RabbitMQ ServiceAccount
apiVersion: v1
kind: ServiceAccount
metadata:
  name: rabbitmq
---
# RabbitMQ Role
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: endpoint-reader
rules:
  - apiGroups: [""]
    resources: ["endpoints"]
    verbs: ["get"]
---
# RabbitMQ RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: endpoint-reader
subjects:
  - kind: ServiceAccount
    name: rabbitmq
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: endpoint-reader

svc.yaml

---
# RabbitMQ Service
apiVersion: v1
kind: Service
metadata:
  name: rabbitmq-cluster
  labels:
    app: rabbitmq-cluster
    type: LoadBalancer
spec:
  selector:
    app: rabbitmq-cluster
  ports:
    - name: amqp-port
      port: 5672
      targetPort: 5672
      protocol: TCP
    - name: mgmt-port
      port: 15672
      targetPort: 15672
      protocol: TCP
---
# RabbitMQ NodePort Service
apiVersion: v1
kind: Service
metadata:
  name: rabbitmq-nodeport
  labels:
    app: rabbitmq-nodeport
    type: LoadBalancer
spec:
  type: NodePort
  selector:
    app: rabbitmq-cluster
  ports:
    - name: amqp-port
      nodePort: 30001
      port: 5672
      targetPort: 5672
      protocol: TCP
    - name: mgmt-port
      nodePort: 30002
      port: 15672
      targetPort: 15672
      protocol: TCP

statefulset.yaml

# RabbitMQ-Cluster StatefulSet
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: rabbitmq-cluster
spec:
  replicas: 3
  selector:
    matchLabels:
      app: rabbitmq-cluster
  serviceName: rabbitmq-internal
  template:
    metadata:
      labels:
        app: rabbitmq-cluster
    spec:
      serviceAccountName: rabbitmq
      containers:
        - name: rabbitmq
          image: rabbitmq:3
          livenessProbe:
            exec:
              # Stage 2 check, more detail at https://www.rabbitmq.com/monitoring.html#health-checks
              command: ["rabbitmq-diagnostics", "status"]
            initialDelaySeconds: 60
            periodSeconds: 60
            timeoutSeconds: 10
          readinessProbe:
            exec:
              # Stage 2 check, more detail at https://www.rabbitmq.com/monitoring.html#health-checks
              command: ["rabbitmq-diagnostics", "status"]
            initialDelaySeconds: 60
            periodSeconds: 60
            timeoutSeconds: 10
          ports:
            - containerPort: 5672
              protocol: TCP
            - containerPort: 15672
              protocol: TCP
          env:
            - name: MY_POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name  # get pod.metadata.name, e.g. rabbitmq-cluster-0
            - name: MY_POD_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace  # get pod.metadata.namespace
            - name: RABBITMQ_DEFAULT_USER
              value: "admin"
            - name: RABBITMQ_DEFAULT_PASS
              value: "admin"
            - name: RABBITMQ_USE_LONGNAME
              value: "true"
            - name: K8S_SERVICE_NAME
              value: "rabbitmq-internal"
            - name: RABBITMQ_NODENAME
              value: "rabbit@$(MY_POD_NAME).$(K8S_SERVICE_NAME).$(MY_POD_NAMESPACE).svc.cluster.local"
            - name: K8S_HOSTNAME_SUFFIX
              value: .$(K8S_SERVICE_NAME).$(MY_POD_NAMESPACE).svc.cluster.local
            - name: RABBITMQ_ERLANG_COOKIE
              value: "SWvCP0Hrqv43NG7GybHC95ntCJKoW8UyNFWnBEWG8TY="    # generator by: echo $(openssl rand -base64 32)
          volumeMounts:
            - name: config-volume
              mountPath: /etc/rabbitmq
      volumes:
        - name: config-volume
          configMap:
            name: rabbitmq-config
            items:
              - key: rabbitmq.conf
                path: rabbitmq.conf
              - key: enabled_plugins
                path: enabled_plugins

kubernetes version:1.16.2 docker version:19.03.8

Gsantomaggio commented 3 years ago

@CORRTAIN I suggest to use our RabbitMQ Operator for Kubernetes.

We don't maintain this kind of deploy anymore (as you see from our old RabbitMQ k8s examples repo).

I will close this issue, let us know if you have problems with the operator.

corrtia commented 3 years ago

thinks