rabbitmq / cluster-operator

RabbitMQ Cluster Kubernetes Operator
https://www.rabbitmq.com/kubernetes/operator/operator-overview.html
Mozilla Public License 2.0
884 stars 273 forks source link

When I used this project to deploy rabbitmq in k8s, the pod startup reported an error: error:{badmatch,{error,eacces}} #1413

Closed charlienss closed 1 year ago

charlienss commented 1 year ago

Describe the bug

When I used this project to deploy rabbitmq in k8s, the pod startup reported an error: error:{badmatch,{error,eacces}}

To Reproduce

When I used this project to deploy rabbitmq in k8s, the pod startup reported an error: 2023-07-26 01:41:06.195575+00:00 [info] <0.230.0> Node database directory at /var/lib/rabbitmq/mnesia/rabbit@hello-world-server-0.hello-world-nodes.rabbitmq-system is empty. Assuming we need to join an existing cluster or initialise from scratch... 2023-07-26 01:41:06.195678+00:00 [info] <0.230.0> Configured peer discovery backend: rabbit_peer_discovery_k8s 2023-07-26 01:41:06.195969+00:00 [info] <0.230.0> Will try to lock with peer discovery backend rabbit_peer_discovery_k8s 2023-07-26 01:41:06.198799+00:00 [notice] <0.44.0> Application mnesia exited with reason: stopped 2023-07-26 01:41:06.199158+00:00 [error] <0.230.0> 2023-07-26 01:41:06.199158+00:00 [error] <0.230.0> BOOT FAILED 2023-07-26 01:41:06.199158+00:00 [error] <0.230.0> ===========

BOOT FAILED

Exception during startup:

error:{badmatch,{error,eacces}}

rabbit_peer_discovery_k8s:make_request/0, line 121
rabbit_peer_discovery_k8s:list_nodes/0, line 41

2023-07-26 01:41:06.199158+00:00 [error] <0.230.0> Exception during startup: 2023-07-26 01:41:06.199158+00:00 [error] <0.230.0> 2023-07-26 01:41:06.199158+00:00 [error] <0.230.0> error:{badmatch,{error,eacces}} 2023-07-26 01:41:06.199158+00:00 [error] <0.230.0> 2023-07-26 01:41:06.199158+00:00 [error] <0.230.0> rabbit_peer_discovery_k8s:make_request/0, line 121 2023-07-26 01:41:06.199158+00:00 [error] <0.230.0> rabbit_peer_discovery_k8s:list_nodes/0, line 41 2023-07-26 01:41:06.199158+00:00 [error] <0.230.0> rabbit_peer_discovery_k8s:lock/1, line 76 2023-07-26 01:41:06.199158+00:00 [error] <0.230.0> rabbit_peer_discovery:lock/0, line 190 2023-07-26 01:41:06.199158+00:00 [error] <0.230.0> rabbit_mnesia:init_with_lock/3, line 105 2023-07-26 01:41:06.199158+00:00 [error] <0.230.0> rabbit_mnesia:init/0, line 77 2023-07-26 01:41:06.199158+00:00 [error] <0.230.0> rabbit_boot_steps:-run_step/2-lc$^0/1-0-/2, line 41 2023-07-26 01:41:06.199158+00:00 [error] <0.230.0> rabbit_boot_steps:run_step/2, line 48 2023-07-26 01:41:06.199158+00:00 [error] <0.230.0> rabbit_peer_discovery_k8s:lock/1, line 76 rabbit_peer_discovery:lock/0, line 190 rabbit_mnesia:init_with_lock/3, line 105 rabbit_mnesia:init/0, line 77 rabbit_boot_steps:-run_step/2-lc$^0/1-0-/2, line 41 rabbit_boot_steps:run_step/2, line 48

2023-07-26 01:41:07.200882+00:00 [error] <0.229.0> crasher: 2023-07-26 01:41:07.200882+00:00 [error] <0.229.0> initial call: application_master:init/4 2023-07-26 01:41:07.200882+00:00 [error] <0.229.0> pid: <0.229.0> 2023-07-26 01:41:07.200882+00:00 [error] <0.229.0> registered_name: [] 2023-07-26 01:41:07.200882+00:00 [error] <0.229.0> exception exit: {{badmatch,{error,eacces}},{rabbit,start,[normal,[]]}} 2023-07-26 01:41:07.200882+00:00 [error] <0.229.0> in function application_master:init/4 (application_master.erl, line 142) 2023-07-26 01:41:07.200882+00:00 [error] <0.229.0> ancestors: [<0.228.0>] 2023-07-26 01:41:07.200882+00:00 [error] <0.229.0> message_queue_len: 1 2023-07-26 01:41:07.200882+00:00 [error] <0.229.0> messages: [{'EXIT',<0.230.0>,normal}] 2023-07-26 01:41:07.200882+00:00 [error] <0.229.0> links: [<0.228.0>,<0.44.0>] 2023-07-26 01:41:07.200882+00:00 [error] <0.229.0> dictionary: [] 2023-07-26 01:41:07.200882+00:00 [error] <0.229.0> trap_exit: true 2023-07-26 01:41:07.200882+00:00 [error] <0.229.0> status: running 2023-07-26 01:41:07.200882+00:00 [error] <0.229.0> heap_size: 2586 2023-07-26 01:41:07.200882+00:00 [error] <0.229.0> stack_size: 28 2023-07-26 01:41:07.200882+00:00 [error] <0.229.0> reductions: 176 2023-07-26 01:41:07.200882+00:00 [error] <0.229.0> neighbours: 2023-07-26 01:41:07.200882+00:00 [error] <0.229.0> 2023-07-26 01:41:07.201568+00:00 [notice] <0.44.0> Application rabbit exited with reason: {{badmatch,{error,eacces}},{rabbit,start,[normal,[]]}} {"Kernel pid terminated",application_controller,"{application_start_failure,rabbit,{{badmatch,{error,eacces}},{rabbit,start,[normal,[]]}}}"} Kernel pid

Expected behavior A clear and concise description of what you expected to happen.

Screenshots

If applicable, add screenshots to help explain your problem.

Version and environment information

Additional context

Add any other context about the problem here.

ansd commented 1 year ago

This gets thrown in https://github.com/rabbitmq/rabbitmq-server/blob/0810162ac1cc199682efd8acb18a06f0a2902518/deps/rabbitmq_peer_discovery_k8s/src/rabbit_peer_discovery_k8s.erl#L121 and looks like /var/run/secrets/kubernetes.io/serviceaccount/token is not readable? Does this file exist? What do the file permissions for that file show? Who is the owner of that file?

Also please provide more details on what infrastructure you deployed the cluster-operator.

charlienss commented 1 year ago

/var/run/secrets/kubernetes.io/serviceaccount/token

I have found the problem. I started the program using the root user and should have started the container using the user corresponding to serviceAccount. After correction, it can start normally