minio / operator

Simple Kubernetes Operator for MinIO clusters :computer:
https://min.io/docs/minio/kubernetes/upstream/index.html
GNU Affero General Public License v3.0
1.18k stars 449 forks source link

minio tenant deployment stuck in "Provisioning initial users" #2165

Closed janhuehne closed 2 months ago

janhuehne commented 3 months ago

After the deployment of a new tenant, the deployment process is stuck in the stage "Provisioning initial users" and the health is "red".

Expected Behavior

The tenant should be up and running

Current Behavior

health status is "red" and the process is stuck in "Provisioning initial users"

Steps to Reproduce (for bugs)

  1. Install MinIO operator (version 5.0.15) on the okd cluster
  2. Create a new tenant

Your Environment

Tenant configuration

apiVersion: minio.min.io/v2
kind: Tenant
metadata:
  name: minio-tenant-st
  namespace: minio-tenant-st
scheduler:
  name: ''
spec:
  requestAutoCert: false
  exposeServices:
    console: true
    minio: true
  users:
    - name: minio-tenant-st-user-0
  imagePullSecret: {}
  credsSecret:
    name: minio-tenant-st-secret
  configuration:
    name: minio-tenant-st-env-configuration
  pools:
    - resources:
        requests:
          cpu: '4'
          memory: 7Gi
      volumesPerServer: 2
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                  - key: v1.min.io/tenant
                    operator: In
                    values:
                      - minio-tenant-st
                  - key: v1.min.io/pool
                    operator: In
                    values:
                      - pool-0
              topologyKey: kubernetes.io/hostname
      name: pool-0
      runtimeClassName: ''
      containerSecurityContext:
        allowPrivilegeEscalation: false
        capabilities:
          drop:
            - ALL
        runAsGroup: 1000790000
        runAsNonRoot: true
        runAsUser: 1000790000
        seccompProfile:
          type: RuntimeDefault
      securityContext:
        fsGroup: 1000790000
        fsGroupChangePolicy: Always
        runAsGroup: 1000790000
        runAsNonRoot: true
        runAsUser: 1000790000
      servers: 2
      volumeClaimTemplate:
        metadata:
          name: data
        spec:
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: '137438953472'
          storageClassName: directpv-min-io
        status: {}
  features: {}
  mountPath: /export
status:
  usage: {}
  availableReplicas: 0
  healthMessage: Service Unavailable
  healthStatus: red
  pools:
    - legacySecurityContext: false
      ssName: minio-tenant-st-pool-0
      state: PoolInitialized
  currentState: Provisioning initial users
  revision: 0
  certificates:
    autoCertEnabled: true
    customCertificates: {}
  syncVersion: v5.0.0

MinIO tenant pod log

minio-tenant-st-pool-0-0:

Waiting for all MinIO sub-systems to be initialize...
Automatically configured API requests per node based on available memory on the system: 304
All MinIO sub-systems initialized successfully in 7.43078ms
MinIO Object Storage Server
Copyright: 2015-2024 MinIO, Inc.
License: GNU AGPLv3 <https://www.gnu.org/licenses/agpl-3.0.html>
Version: RELEASE.2024-05-01T01-11-10Z (go1.21.9 linux/amd64)
API: http://minio.minio-tenant-suptech.svc.cluster.local
WebUI: http://10.131.0.78:9090 http://127.0.0.1:9090
Docs: https://min.io/docs/minio/linux/index.html
Status:         4 Online, 0 Offline.

minio-tenant-st-pool-0-1:

Waiting for all MinIO sub-systems to be initialize...
Automatically configured API requests per node based on available memory on the system: 405
All MinIO sub-systems initialized successfully in 6.140372ms
MinIO Object Storage Server
Copyright: 2015-2024 MinIO, Inc.
License: GNU AGPLv3 <https://www.gnu.org/licenses/agpl-3.0.html>
Version: RELEASE.2024-05-01T01-11-10Z (go1.21.9 linux/amd64)
API: http://minio.minio-tenant-suptech.svc.cluster.local
WebUI: http://10.128.3.249:9090 http://127.0.0.1:9090
Docs: https://min.io/docs/minio/linux/index.html
Status:         4 Online, 0 Offline.

Operator error

I0613 22:03:22.769414 1 event.go:364] Event(v1.ObjectReference{Kind:"Tenant", Namespace:"minio-tenant-st", Name:"minio-tenant-st", UID:"2a48b9ca-735a-4476-bf33-0e8870bbc9ba", APIVersion:"minio.min.io/v2", ResourceVersion:"6119438", FieldPath:""}): type: 'Warning' reason: 'UsersCreatedFailed' Users creation failed: context deadline exceeded

Node overview:

MinIO tenant status

kubectl minio tenant status minio-tenant-st

=====================
Pools:              1 
Revision:           0 
Sync version:       v5.0.0 
Write quorum:       0 
Health status:       
Drives online:      0 
Drives offline:     0 
Drives healing:     0 
Current status:     Provisioning initial users 
Usable capacity:    0 B 
Provisioned users:  false 
Available replicas: 0 
99brgs commented 3 months ago

Same happens to me

ramondeklein commented 3 months ago

I was able to reproduce this partially, although after some seconds the users were provisioned correctly:

Type     Reason              Age                From            Message
----     ------              ----               ----            -------
Normal   SvcCreated          51s                minio-operator  MinIO Service Created
Normal   SvcCreated          51s                minio-operator  Console Service Created
Normal   SvcCreated          51s                minio-operator  Headless Service created
Normal   SACreated           51s                minio-operator  Service Account Created
Normal   RoleCreated         51s                minio-operator  Role Created
Normal   BindingCreated      51s                minio-operator  Role Binding Created
Normal   PoolCreated         50s                minio-operator  Tenant pool pool-0 created
Warning  UsersCreatedFailed  45s (x2 over 48s)  minio-operator  Users creation failed: Put "http://minio.minio-tenant-st2.svc.cluster.local/minio/admin/v3/add-user?accessKey=user": dial tcp 10.96.102.224:80: connect: connection refused
Normal   Updated             45s                minio-operator  Headless Service Updated
Warning  UsersCreatedFailed  20s                minio-operator  Users creation failed: context deadline exceeded
Normal   UsersCreated        9s                 minio-operator  Users created

I used the following YAML to create the tenant:

apiVersion: v1
kind: Namespace
metadata:
  name: minio-tenant-st
---
apiVersion: minio.min.io/v2
kind: Tenant
metadata:
  name: minio-tenant-st
  namespace: minio-tenant-st
spec:
  requestAutoCert: false
  users:
    - name: minio-tenant-st-user-0
  configuration:
    name: minio-tenant-st-env-configuration
  pools:
    - volumesPerServer: 2
      name: pool-0
      servers: 2
      volumeClaimTemplate:
        metadata:
          name: data
        spec:
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: '2147483648'
---
apiVersion: v1
kind: Secret
metadata:
  name: minio-tenant-st-env-configuration
  namespace: minio-tenant-st
type: Opaque
stringData:
  config.env: |-
    export MINIO_ROOT_USER="minio"
    export MINIO_ROOT_PASSWORD="minio123"
---
apiVersion: v1
kind: Secret
metadata:
  name: minio-tenant-st-user-0
  namespace: minio-tenant-st
type: Opaque
stringData:
  CONSOLE_ACCESS_KEY: user
  CONSOLE_SECRET_KEY: minio123

The operator logging shows:

minio-operator-6b7848fd84-t2q8r I0617 15:43:23.767813       1 event.go:364] Event(v1.ObjectReference{Kind:"Tenant", Namespace:"minio-tenant-st5", Name:"minio-tenant-st", UID:"a3908bd1-439c-41d3-bde4-62aa37ac8bcd", APIVersion:"minio.min.io/v2", ResourceVersion:"19502", FieldPath:""}): type: 'Warning' reason: 'UsersCreatedFailed' Users creation failed: context deadline exceeded
minio-operator-6b7848fd84-t2q8r I0617 15:43:39.298884       1 event.go:364] Event(v1.ObjectReference{Kind:"Tenant", Namespace:"minio-tenant-st5", Name:"minio-tenant-st", UID:"a3908bd1-439c-41d3-bde4-62aa37ac8bcd", APIVersion:"minio.min.io/v2", ResourceVersion:"19540", FieldPath:""}): type: 'Normal' reason: 'UsersCreated' Users created

It looks like it takes a little while before the /minio/admin/v3/add-user endpoint is available and able to respond.

cesnietor commented 3 months ago

@janhuehne could you please share the entire operator logs ? that would help us more. Some requests can fail till minio service is up. We're gonna consider how we can expose this better to the user.

99brgs commented 3 months ago

I am getting this in the pool logs

Unable to use the drive https://saae-pool-0-1.saae-hl.minio-saae.svc.cluster.local:9000/export0: drive not found Unable to use the drive https://saae-pool-0-1.saae-hl.minio-saae.svc.cluster.local:9000/export1: drive not found Waiting for a minimum of 2 drives to come online (elapsed 3m7s)

ramondeklein commented 3 months ago

It has nothing to do with provisioning the users. It can't provision the users, because the Minio cluster won't come up properly. Please run kubectl logs -n minio-saae -l "v1.min.io/tenant=saae" and post the output.

It seems like pod saae-pool-0-0 cannot connect to pod saae-pool-0-1 for some reason. You should be able to find out the status of this pod by running kubectl -n minio-saae describe pod saae-pool-0-1.

99brgs commented 3 months ago
Unable to use the drive https://saae-pool-0-1.saae-hl.minio-saae.svc.cluster.local:9000/export0: drive not found
Unable to use the drive https://saae-pool-0-1.saae-hl.minio-saae.svc.cluster.local:9000/export1: drive not found
Waiting for a minimum of 2 drives to come online (elapsed 10h58m43s)

API: SYSTEM.storage
Time: 06:39:56 UTC 06/18/2024
Error: unexpected drive ordering on pool: 1st: found drive at (set=1st, drive=1st), expected at (set=1st, drive=2nd): /export0(): inconsistent drive found (*fmt.wrapError)
       6: internal/logger/logonce.go:118:logger.(*logOnceType).logOnceIf()
       5: internal/logger/logonce.go:149:logger.LogOnceIf()
       4: cmd/logging.go:164:cmd.storageLogOnceIf()
       3: cmd/xl-storage.go:311:cmd.newXLStorage()
       2: cmd/storage-rest-server.go:1366:cmd.registerStorageRESTHandlers.func2()
       1: cmd/storage-rest-server.go:1398:cmd.registerStorageRESTHandlers.func3()
99brgs commented 3 months ago
The pod saae-pool-0-1 is in Running state but it does not mount /export1
Volumes:
  data0:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data0-saae-pool-0-1
    ReadOnly:   false
  data1:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data1-saae-pool-0-1
    ReadOnly:   false

df -h in the pod
Filesystem                         Size  Used Avail Use% Mounted on
overlay                            232G   37G  185G  17% /
tmpfs                               64M     0   64M   0% /dev
/dev/sda1                          3.6T   96K  3.4T   1% /export0
/dev/mapper/ubuntu--vg-ubuntu--lv  232G   37G  185G  17% /data
tmpfs                               32G   12K   32G   1% /tmp/certs
shm                                 64M     0   64M   0% /dev/shm
tmpfs                               32G   12K   32G   1% /run/secrets/kubernetes.io/serviceaccount
tmpfs                               16G     0   16G   0% /proc/asound
tmpfs                               16G     0   16G   0% /proc/acpi
tmpfs                               16G     0   16G   0% /proc/scsi
tmpfs                               16G     0   16G   0% /sys/firmware
99brgs commented 3 months ago
NAME                  STATUS   VOLUME          CAPACITY   ACCESS MODES   STORAGECLASS    VOLUMEATTRIBUTESCLASS   AGE
data0-saae-pool-0-0   Bound    minio-w1-2-pv   1536Gi     RWO            local-storage   <unset>                 11h
data0-saae-pool-0-1   Bound    minio-w2-1-pv   1536Gi     RWO            local-storage   <unset>                 11h
data1-saae-pool-0-0   Bound    minio-w1-1-pv   1536Gi     RWO            local-storage   <unset>                 11h
data1-saae-pool-0-1   Bound    minio-w2-2-pv   1536Gi     RWO            local-storage   <unset>                 11h
99brgs commented 3 months ago

Thanks for your help

ramondeklein commented 3 months ago

I think the message The pod saae-pool-0-1 is in Running state but it does not mount /export1 is key. There's something wrong in your k8s configuration, so it can't mount the volume in this particular pod. If you look at the PVC list, then all volumes are bound. You may want to check which PVC is bound to which pod.

You can check the PVC information by running kubectl -n minio-saae describe pvc. That should tell you which PVC is bound to which pod (Used By: line). This should match with the volume mapping in the pods that can be seen using kubectl -n minio-saae describe pods.

99brgs commented 3 months ago

I made it work. Some errata in the creation of the PVs

Alii2121 commented 2 months ago

Check the operator logs for any x509 Certificate signed by unknown authority errors, I had the same issue and it was caused by the TLS Certificate created by the operator.

To test it try to create a tenant without the TLS option from the console under the Security parameters, if it worked you will might need to use a custom cert.

cesnietor commented 2 months ago

Closing this issue since this seems a configuration issue. Please open a new issue with more details including the log file so that we can further investigate.