minio / operator

Simple Kubernetes Operator for MinIO clusters :computer:
https://min.io/docs/minio/kubernetes/upstream/index.html
GNU Affero General Public License v3.0
1.18k stars 449 forks source link

tenant S3 api not reachable / Login in Tenant console not working / :443: i/o timeout #1794

Closed fritz-net closed 5 months ago

fritz-net commented 11 months ago

tenant S3 API not reachable therefor login fails and also user creation on tenant deployment fails.

What I noticed and what I did:

I installed the k8s Operator via the plugin (and I also tried via helm) The tenants are properly created, pods are spawned and certs are created however I get the following error in events (and k8s) Users creation failed Put https://minio.namespace.svc.cluster.local/minio/admin/v3/add-user?accessKey=censored: dial tcp :443: connect: connection refused

Also login in the Tenants console does not work. The page also needs a long time to load (I would guess as long as the timeout for login needs) The error I get is: Post "https://minio.namespace.svc.cluster.local/": dial tcp 10.100.109.225:443: i/o timeout (this is the IP of the service)

I noticed that S3 API is only bound to localhost (snipped from pod logs):

Status:         1 Online, 0 Offline. 
S3-API: http://localhost:9000 
Console: http://172.17.0.23:9090 http://127.0.0.1:9090 

(this is the IP of the pod)

dirty workaround

when I switch to MINIO_SERVER_URL="http://localhost:9000" at least the login works. I tried it with and without tls. With TLS i get a cert error when setting serverurl to localhost (which makes sense since its the wrong domain). So I disabled it for my workaround.

Expected Behavior

The expected behavior is that connection via the k8s service is possible to S3 API

Current Behavior

connection gets dropped

Possible Solution

change the binding of the tenant console container to 0.0.0.0:9000 However I did not find out how to do this or why this looks like this in the first place

EDIT: binding can be set by the cmd arg --address as i found out, however I found not way to pass it to the container I also was not able to set it by editing the k8s resources because the operator would "fix" them immediately however the console/web port can be set as seen here https://github.com/minio/operator/blob/cf4d30f027b8cc77b3647aa82a36fc6df0f98c2b/pkg/resources/statefulsets/minio-statefulset.go#L291C6-L291C30

EDIT2: scaled the operator to 0 and added the --address 0.0.0.0:9123 in the end of the container args. The container still logged that it was binding to localhost:9000 I also tried different host and port combinations, non took effect

Steps to Reproduce (for bugs)

  1. install minio k8s Operator on minikube (maybe relevant replicaCount: 1) via helm or via krew (tested both)
  2. install tenant via yaml/helm
  3. see error for user creation in k8s logs and in events of minio operator webinterface
  4. see error on login

Context

I was trying to get started with minio (operator) for the first time

Regression

Your Environment

jiuker commented 11 months ago

I think that‘s should happen at minio's pods before ready. That can't happen always. @fritz-net And that are acceptable for temporarily.

fritz-net commented 11 months ago

@jiuker thanks for the response. I tests it after the pod is running for 7 days now - same result. State of pod (minio and sidecar) is running, ready. Also the Tenant is showing healthy and State: Initialized in the Operator console

How can I provide you any further information? Which additional information do you need? Can I test things to further give you some insights into the issue.

test with different cluster

I also tested it on a different cluster (create by kubeadm not minikube). There I get domain resolution for the S3-API:

MinIO Object Storage Server
Copyright: 2015-2023 MinIO, Inc.
License: GNU AGPLv3 <https://www.gnu.org/licenses/agpl-3.0.html>
Version: RELEASE.2023-09-07T02-05-02Z (go1.21.1 linux/amd64)

Status:         1 Online, 0 Offline. 
S3-API: https://minio.preprod.svc.cluster.local 
Console: https://10.244.3.38:9443 https://127.0.0.1:9443   

Documentation: https://min.io/docs/minio/linux/index.html
Warning: The standard parity is set to 0. This can lead to data loss.

 You are running an older version of MinIO released 3 weeks before the latest release 
 Update: Run `mc admin update` 

2023-10-06T03:51:25.397291342+02:00

but when loading the login page (which takes ages) I get: "detailedMessage": "Get \"https://minio.prod.svc.cluster.local/minio/admin/v3/accountinfo\": dial tcp 10.105.203.116:443: i/o timeout", on https://localhost:45497/api/v1/session with 500

on actual login https://localhost:45497/api/v1/login i get the same but with 401:

{
    "detailedMessage": "Post \"https://minio.prod.svc.cluster.local/\": dial tcp 10.105.203.116:443: i/o timeout",
    "message": "invalid Login"
}

if i use the tenant console button from the operator console i get 403 on http://localhost:48851/api/hop/prod/minio-shadow-env/?cp=y with no repsonse body

test with MC client

cluster 2: kubeadm

Today I tested the MC client against the second cluster (the one were it binds to dns name) which seems to work

PS C:\Users\user\Downloads> ./mc.exe alias set prod https://localhost ***key ****secret
mc.exe: <ERROR> Unable to initialize new alias from the provided credentials. Get "https://localhost": tls: failed to verify certificate: x509: certificate is valid for minio-shadow-env-pool-0-0.minio-shadow-env-hl.prod.svc.cluster.local, minio.prod.svc.cluster.local, minio.prod, minio.prod.svc, *., *.prod.svc.cluster.local, not localhost.
PS C:\Users\user\Downloads> ./mc.exe alias set prod https://localhost ***key ****secret --insecure
Added `prod` successfully.
PS C:\Users\user\Downloads> ./mc.exe admin info prod
mc.exe: <ERROR> Unable to get service status. Get "https://localhost/minio/admin/v3/info": tls: failed to verify certificate: x509: certificate is valid for minio-shadow-env-pool-0-0.minio-shadow-env-hl.prod.svc.cluster.local, minio.prod.svc.cluster.local, minio.prod, minio.prod.svc, *., *.prod.svc.cluster.local, not localhost.
PS C:\Users\user\Downloads> ./mc.exe admin info prod --insecure
●  localhost
   Uptime: 3 days
   Version: 2023-09-07T02:05:02Z
   Network: 1/1 OK
   Drives: 1/1 OK
   Pool: 1

Pools:
   1st, Erasure sets: 1, Drives per erasure set: 1

0 B Used, 1 Bucket, 0 Objects
1 drive online, 0 drives offline
PS C:\Users\user\Downloads> ./mc.exe ls prod --insecure
[2023-10-03 17:48:11 CEST]     0B images/
PS C:\Users\user\Downloads> ./mc.exe od if=mc.exe of=prod/images/mc.exe --insecure
Transferred: 26 MiB, Parts: 1, Time: 10.812s, Speed: 2.4 MiB/s
PS C:\Users\user\Downloads> ./mc.exe ls prod/images --insecure
[2023-10-09 12:26:27 CEST]  26MiB STANDARD mc.exe

cluster 1: minikube

For the first cluster (minikube; binding to localhost with tls disabled) I can't connect:

PS C:\Users\user\Downloads> ./mc.exe alias set minikube http://localhost:59915/ ***key ***secret --insecure
mc.exe: <ERROR> Unable to initialize new alias from the provided credentials. Get "http://localhost:59915/probe-bucket-sign-nf45kw2xrkn0/?location=": dial tcp [::1]:59915: connectex: No connection could be made because the target machine actively refused it.
jiuker commented 11 months ago

I think that should your machine can't connect the minikube's nodeport. You can deploy a nginx to check it. @fritz-net

fritz-net commented 11 months ago

I have other services running which I can connect to, also i can connect to minio operator console without problems (also login works fine)

These error messages I posted are response bodys of the http requests so I guess minio tenant is calling these endpoints internal when doing the login

also I see/open the login of the minio tenant console, but the login fails. See images as additional Info for tenant console login image image image (images are from another namespace prod than the console log from previous post is from preprod) But also for this namespace binding seems fine:

MinIO Object Storage Server
Copyright: 2015-2023 MinIO, Inc.
License: GNU AGPLv3 <https://www.gnu.org/licenses/agpl-3.0.html>
Version: RELEASE.2023-09-07T02-05-02Z (go1.21.1 linux/amd64)

Status:         1 Online, 0 Offline. 
S3-API: https://minio.prod.svc.cluster.local 
Console: https://10.244.3.39:9443 https://127.0.0.1:9443   

Documentation: https://min.io/docs/minio/linux/index.html
Warning: The standard parity is set to 0. This can lead to data loss.

 You are running an older version of MinIO released 3 weeks before the latest release 
 Update: Run `mc admin update` 

2023-10-06T03:51:26.321280980+02:00

my thoughts

Thats why I think tenant console cannot connect (internally) to S3 port. I have no Idea why this is not working since on Cluster 2 (kubeadm, prod, preprod) the binding is correct and i also tested mc with success However on minikube which is way simpler setup from the cluster side the binding is to localhost

further plans

  1. i will install a container with mc and try connecting to the S3 api internally via the k8s servicename
  2. i will setup ingress for the tenant console - this should just need a single port? (9443 on pod and also on service)

results

  1. for cluster 2 it worked out of the box, also ssl certs worked, no --insecure flag was needed. For cluster 1 (minikube) i could not connect internally neither via service nor using ip of pod. I used the same yaml/heml chart for both.
jiuker commented 11 months ago

You said that minio tenant is calling these endpoints internal when doing the login. Tenant never do that. Can you show the console log?

fritz-net commented 11 months ago

Login directly on Tenant Console

I guessed this from the error message and its source or origin (RestAPI returning it cannot reach xyz is telling me that the server which response with this message is doing this (failing) request).

Here the request I mean, the error message from the first screenshot is the direct result from the API call image image

console log of tenant is empty:

Formatting 1st pool, 1 set(s), 1 drives per set.
WARNING: Host local has more than 0 drives of set. A host failure will result in data becoming unavailable.
MinIO Object Storage Server
Copyright: 2015-2023 MinIO, Inc.
License: GNU AGPLv3 <https://www.gnu.org/licenses/agpl-3.0.html>
Version: RELEASE.2023-09-07T02-05-02Z (go1.21.1 linux/amd64)

Status:         1 Online, 0 Offline. 
S3-API: https://minio.gitlab-managed-apps.svc.cluster.local 
Console: https://172.17.0.21:9443 https://127.0.0.1:9443 

Documentation: https://min.io/docs/minio/linux/index.html
Warning: The standard parity is set to 0. This can lead to data loss.

 You are running an older version of MinIO released 1 month before the latest release 
 Update: Run `mc admin update` 

2023-10-09T12:56:46.011007593Z
2023/10/09 12:56:45 sidecar.go:48: Starting Sidecar

However I saw that after recreating the S3 API was now binding to dns not localhost anymore and I successfully could test mc connection from operator namespace, same namespace and via portforwarding :) However I still want to find out why these error messages are coming since I already put a lot of time in this issue and want to avoid someone else being stuck there

EDIT: I guess this is the request failing. it seems that it is not logging this failure to consoleoutput.

Login via Operator Console

In the meantime I have further insights. Operator console is doing some connections, which are failing, to the tenant for sure when using Operator console to (view) log in(to) tenant console image And the log then outputs the following:

2023/10/09 13:05:23 proxy.go:183: couldn't login to tenant and get cookie
2023/10/09 13:05:26 proxy.go:183: couldn't login to tenant and get cookie

However Operator should be able to connect (at least on second try): image

TartanLeGrand commented 6 months ago

up ?

luis-fnogueira commented 5 months ago

up?

jiuker commented 5 months ago

up?

What's your setup? Could you post your steps?

aydinseven7 commented 5 months ago

If your network is secure, you could set tenant.certificate.requestAutoCert to false inside the tenant values chart, then not force HTTPS as backend protocol (see below). If you keep the HTTPS annotation, it results in a 502 error

The ingress config inside the tenant values is then:

ingressClassName: "nginx"
labels: { }
annotations:
  kubernetes.io/ingress.class: "nginx"
  ## Remove if using CA signed certificate
  nginx.ingress.kubernetes.io/proxy-ssl-verify: "off"
  # nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
  nginx.ingress.kubernetes.io/rewrite-target: /
  nginx.ingress.kubernetes.io/proxy-body-size: "0"
  nginx.ingress.kubernetes.io/server-snippet: |
    client_max_body_size 0;
  nginx.ingress.kubernetes.io/configuration-snippet: |
    chunked_transfer_encoding off;
tls: [ ]
host: console.test-1.minio.infra.corp.domain.com
path: /
pathType: Prefix

Same for the api part.

The rest regarding networking is left at default.

I was seeing the same couldn't login to tenant and get cookie errors inside my operator pods earlier, logging into tenant console only resulted in no content after login and redirect back to login on reload.

matheus-swenson commented 5 months ago

Any updates here? I am having the same problem. But in my case the proxy in the front of minio is Kong, and I am migrating from a standalone 12.8.12 to operator/tenant. Parameter requestAutoCert: false, already configured. Same behavior, i can connect on the operator console, can use minio client mc to upload files. But receiving a 499 trough kong. Not sure if this can help, but the output for mc admin logs --debug --json --insecure -l 2 myminio

{
 "status": "success",
 "deploymentid": "98193828-7caf-436b-b2d5-e53df5b5850c",
 "level": "ERROR",
 "errKind": "ALL",
 "time": "2024-04-11T01:24:22.167471427Z",
 "api": {
  "name": "SYSTEM",
  "args": {}
 },
 "error": {
  "message": "invalid semicolon separator in query (*errors.errorString)",
  "source": [
   "internal/logger/logger.go:258:logger.LogIf()",
   "cmd/auth-handler.go:129:cmd.getRequestAuthType()",
   "cmd/auth-handler.go:603:cmd.setAuthMiddleware.func1()",
   "net/http/server.go:2136:http.HandlerFunc.ServeHTTP()"
  ]
 },
 "ConsoleMsg": "",
 "node": ""
}
jiuker commented 5 months ago

@fritz-net Could you access it by nodeport?

jiuker commented 5 months ago

Close it. Please open an new issue for that and write more steps about the setup. I can't reproduce it.

fritz-net commented 2 months ago

@fritz-net Could you access it by nodeport?

I use port forwarding and I can reach the service. As pointed out above with the screenshots I can reach the backend from the browser. But the backend returns it cannot reach itself(??) via the k8s service. The IP in the error is the correctly resolved one of the service.

the S3 functionality works perfect (inside k8s) since my application uses it to upload and download data from it successfully, just the tenant console login is not working

Attached u will find additional insights into my setup, please inform me if and what infos additionaly can provide to u

these are the yamls I linked in the initial post to

# taken from https://github.com/minio/operator/blob/master/examples/kustomization/base/tenant.yaml
apiVersion: minio.min.io/v2
kind: Tenant
metadata:
  name: minio-shadow-env
  labels:
    app: minio
  annotations:
    prometheus.io/path: /minio/v2/metrics/cluster
    prometheus.io/port: "9000"

spec:
  features:
    bucketDNS: false
    domains: { }
  users:
    - name: minio-storage-user
      policyName: all-buckets-policy
  buckets:
    - name: "images"
    - name: "public.images"
    - name: "tmp.images"
  iam:
    policies:
      - name: all-buckets-policy
        policy: |
          {
            "Version": "2012-10-17",
            "Statement": [
              {
                "Action": [
                  "s3:GetObject",
                  "s3:PutObject",
                  "s3:ListBucket",
                  "s3:DeleteObject",
                  "s3:CreateBucket",
                  "s3:DeleteBucket"
                ],
                "Effect": "Allow",
                "Resource": [
                  "arn:aws:s3:::*"
                ]
              }
            ]
          }
  certConfig: { }
  podManagementPolicy: Parallel
  configuration:
    name: minio-storage-configuration
  env: [ ]
  serviceMetadata:
    minioServiceLabels: { }
    minioServiceAnnotations: { }
    consoleServiceLabels: { }
    consoleServiceAnnotations: { }
  priorityClassName: ""
  externalCaCertSecret: [ ]
  externalCertSecret: [ ]
  externalClientCertSecrets: [ ]
  imagePullSecret: { }
  mountPath: /export
  subPath: ""
  serviceAccountName: ""
  pools:
    - servers: 1
      name: pool-0
      topologySpreadConstraints: [ ]
      volumesPerServer: 1
      nodeSelector: { }
      tolerations: [ ]
      affinity:
        nodeAffinity: { }
        podAffinity: { }
        podAntiAffinity: { }
      resources: { }
      volumeClaimTemplate:
        apiVersion: v1
        kind: persistentvolumeclaims
        metadata: { }
        spec:
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 5Gi
          storageClassName: {{ .Values.minio.storageClassName }}
        status: { }
      securityContext:
        runAsUser: 1000
        runAsGroup: 1000
        runAsNonRoot: true
        fsGroup: 1000
        fsGroupChangePolicy: "OnRootMismatch"
      containerSecurityContext:
        runAsUser: 1000
        runAsGroup: 1000
        runAsNonRoot: true
  requestAutoCert: true
# from https://github.com/minio/operator/blob/master/examples/kustomization/base/tenant-config.yaml
apiVersion: v1
kind: Secret
metadata:
  name: minio-storage-configuration
  #namespace: minio-tenant
type: Opaque
stringData:
  config.env: |-
    export MINIO_ROOT_USER="minio"
    export MINIO_ROOT_PASSWORD="minio_passwp
apiVersion: v1
data:
  # take from values and encode bas64
  CONSOLE_ACCESS_KEY: {{ .Values.minio.accessKey | b64enc | quote }}
  CONSOLE_SECRET_KEY: {{ .Values.minio.secretKey | b64enc | quote }}
kind: Secret
metadata:
  name: minio-storage-user
  # namespace: default
type: Opaque

The first screenshot shows the login request to the backend /api/v1/login (which can be reached since it returns 401) image

The second screenshot shows the request body image

The third screenshot shows the response body image

Here you can see the configured service on the tenant pod: image

here is the service it tries to connect to image