opensearch-project / opensearch-k8s-operator

OpenSearch Kubernetes Operator
Apache License 2.0
404 stars 218 forks source link

ERROR "you may need to run securityadmin" #438

Closed Dougems closed 1 year ago

Dougems commented 1 year ago

Environment

I am running a 2-node MicroK8s cluster on Ubuntu VMs.

$ snap info microk8s | grep installed
installed:               v1.26.1                    (4595) 176MB classic

$ mk get nodes -o wide
NAME           STATUS   ROLES    AGE   VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
ub-nodedev     Ready    <none>   17d   v1.26.1   10.140.70.4     <none>        Ubuntu 22.04.1 LTS   5.15.0-60-generic   containerd://1.6.8
ub-nodedev1    Ready    <none>   25h   v1.26.1   10.140.69.217   <none>        Ubuntu 22.04.1 LTS   5.15.0-60-generic   containerd://1.6.8
  1. ub-nodedev is the master node from which I run all "microk8s kubectl " commands (aliased to "mk").
  2. ub-nodedev1 is a worker node.

This is for development purposes. I know this does not represent a production topology. I am able to deploy pods, daemonsets, etc. using kubectl on all nodes, and based on my usage the cluster appears to be "working".

What I Did

Followed example here: https://github.com/Opster/opensearch-k8s-operator/blob/v2.2.1/docs/userguide/main.md

I modified the cluster.yaml slightly, the example contains "NodeSelector:" which k8s didn't like. I think it's the wrong case. So I used:

apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
metadata:
  name: my-first-cluster
  namespace: default
spec:
  general:
    serviceName: my-first-cluster
    version: 2.3.0
  dashboards:
    enable: true
    version: 2.3.0
    replicas: 1
    resources:
      requests:
         memory: "512Mi"
         cpu: "200m"
      limits:
         memory: "512Mi"
         cpu: "200m"
  nodePools:
    - component: nodes
      replicas: 3
      diskSize: "5Gi"
      resources:
         requests:
            memory: "2Gi"
            cpu: "500m"
         limits:
            memory: "2Gi"
            cpu: "500m"
      roles:
        - "cluster_manager"
        - "data"

Applied:

$ mk apply -f cluster.yaml
opensearchcluster.opensearch.opster.io/my-first-cluster created

What Happened

my-first-cluster-nodes-0 never left Pending state:

$ mk get pods -o wide | grep my-first
my-first-cluster-nodes-0                                  0/1     Pending   0              3m13s   <none>         <none>         <none>           <none>
my-first-cluster-dashboards-764cc4fd6d-bdjqv              0/1     Running   0              3m10s   10.1.248.35    ub-nodedev1   <none>           <none>
my-first-cluster-bootstrap-0                              1/1     Running   0              3m13s   10.1.248.34    ub-nodedev1   <none>           <none>

Log for my-first-cluster-bootstrap-0 ends with entries as shown below that repeat forever (I will attach full log):

my-first-cluster-bootstrap-0.log

$ mk get log my-first-cluster-bootstrap-0
...
[2023-02-17T04:54:47,347][ERROR][o.o.s.a.BackendRegistry  ] [my-first-cluster-bootstrap-0] Not yet initialized (you may need to run securityadmin)
[2023-02-17T04:54:47,349][ERROR][o.o.s.a.BackendRegistry  ] [my-first-cluster-bootstrap-0] Not yet initialized (you may need to run securityadmin)
[2023-02-17T04:54:47,352][ERROR][o.o.s.a.BackendRegistry  ] [my-first-cluster-bootstrap-0] Not yet initialized (you may need to run securityadmin)

Also, it's not clear if the unbound PVC here is contributing to the problem:

$ mk get pvc
NAME                            STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
test-pvc                        Bound     pvc-2b36fb76-3ea2-417f-bd17-76bf69b6c843   5Gi        RWO            mayastor       9d
test-claim                      Bound     pvc-d189843c-94da-4d20-8ba5-b1f3b33c5676   1Mi        RWX            nfs-client     6d6h
data-my-first-cluster-nodes-0   Pending                                                                                       5m36s

Expected

The cluster.yaml above should yield a functional OS cluster

Actual

ERRORs in bootstrap pod

swoehrl-mw commented 1 year ago

Hi @Dougems. The pending volume is very likely your problem. The bootstrap pod is just a helper during setup, and it cannot function alone. Please find out why the volume is pending and I'm pretty sure once you fix that and the nodes-0 pod comes up your problem will go away.

Dougems commented 1 year ago

@swoehrl-mw thank you for the reply and assistance. After resolving the issue with the PV everything worked.