integr8ly / application-monitoring-operator

Operator for installing the Application Monitoring Stack on OpenShift (Prometheus, AlertManager, Grafana)
Apache License 2.0
30 stars 44 forks source link

Support for OCP4 #62

Closed mkralik3 closed 5 years ago

mkralik3 commented 5 years ago

After installation make cluster/install I added my RedHat token and link this secret to grafana-operator / alertmanager / prometheus-application-monitoring secret according to README

kubectl create -f ../my-secret.yaml --namespace=application-monitoring
oc secrets link grafana-operator my-secret --for=pull
oc secrets link alertmanager my-secret --for=pull
oc secrets link prometheus-application-monitoring my-secret --for=pull

After that I redeploy all failing pods. However the alertmanager-application-monitoring and prometheus-application-monitoring are in the ImagePullBackOff status

alertmanager-application-monitoring-0              0/3       ImagePullBackOff   0          6m4s
application-monitoring-operator-595d68dcf6-xpq2l   1/1       Running            0          9m47s
grafana-operator-7c5b565454-d4jrh                  1/1       Running            0          8m56s
prometheus-application-monitoring-0                1/5       ImagePullBackOff   0          5m54s
prometheus-operator-7f9c5c8b88-vmxft               1/1       Running            0          9m18s

oc describe pod alertmanager-application-monitoring-0

Events:
  Type     Reason     Age              From                                     Message
  ----     ------     ----             ----                                     -------
  Normal   Scheduled  2m               default-scheduler                        Successfully assigned application-monitoring/alertmanager-application-monitoring-0 to ip-172-31-149-100.ec2.internal
  Normal   BackOff    2m               kubelet, ip-172-31-149-100.ec2.internal  Back-off pulling image "registry.redhat.io/openshift3/oauth-proxy:v3.11.43"
  Warning  Failed     2m               kubelet, ip-172-31-149-100.ec2.internal  Error: ImagePullBackOff
  Normal   BackOff    2m               kubelet, ip-172-31-149-100.ec2.internal  Back-off pulling image "registry.redhat.io/openshift3/ose-configmap-reloader:v3.11"
  Normal   BackOff    2m               kubelet, ip-172-31-149-100.ec2.internal  Back-off pulling image "registry.redhat.io/openshift3/prometheus-alertmanager:v3.11"
  Warning  Failed     2m               kubelet, ip-172-31-149-100.ec2.internal  Error: ImagePullBackOff
  Warning  Failed     2m (x2 over 2m)  kubelet, ip-172-31-149-100.ec2.internal  Error: ErrImagePull
  Normal   Pulling    2m (x2 over 2m)  kubelet, ip-172-31-149-100.ec2.internal  Pulling image "registry.redhat.io/openshift3/oauth-proxy:v3.11.43"
  Warning  Failed     2m (x2 over 2m)  kubelet, ip-172-31-149-100.ec2.internal  Failed to pull image "registry.redhat.io/openshift3/ose-configmap-reloader:v3.11": rpc error: code = Unknown desc = unable to retrieve auth token: invalid username/password
  Normal   Pulling    2m (x2 over 2m)  kubelet, ip-172-31-149-100.ec2.internal  Pulling image "registry.redhat.io/openshift3/ose-configmap-reloader:v3.11"
  Warning  Failed     2m (x2 over 2m)  kubelet, ip-172-31-149-100.ec2.internal  Error: ErrImagePull
  Warning  Failed     2m (x2 over 2m)  kubelet, ip-172-31-149-100.ec2.internal  Failed to pull image "registry.redhat.io/openshift3/prometheus-alertmanager:v3.11": rpc error: code = Unknown desc = unable to retrieve auth token: invalid username/password
  Normal   Pulling    2m (x2 over 2m)  kubelet, ip-172-31-149-100.ec2.internal  Pulling image "registry.redhat.io/openshift3/prometheus-alertmanager:v3.11"
  Warning  Failed     2m (x2 over 2m)  kubelet, ip-172-31-149-100.ec2.internal  Error: ErrImagePull
  Warning  Failed     2m (x2 over 2m)  kubelet, ip-172-31-149-100.ec2.internal  Failed to pull image "registry.redhat.io/openshift3/oauth-proxy:v3.11.43": rpc error: code = Unknown desc = unable to retrieve auth token: invalid username/password
  Warning  Failed     1m (x2 over 2m)  kubelet, ip-172-31-149-100.ec2.internal  Error: ImagePullBackOff

oc describe pod prometheus-application-monitoring-0

Events:
  Type     Reason     Age              From                                     Message
  ----     ------     ----             ----                                     -------
  Normal   Scheduled  3m               default-scheduler                        Successfully assigned application-monitoring/prometheus-application-monitoring-0 to ip-172-31-129-245.ec2.internal
  Normal   Pulling    3m               kubelet, ip-172-31-129-245.ec2.internal  Pulling image "registry.redhat.io/openshift3/prometheus:v3.11"
  Warning  Failed     3m               kubelet, ip-172-31-129-245.ec2.internal  Error: ErrImagePull
  Warning  Failed     3m               kubelet, ip-172-31-129-245.ec2.internal  Error: ErrImagePull
  Normal   Pulling    3m               kubelet, ip-172-31-129-245.ec2.internal  Pulling image "registry.redhat.io/openshift3/ose-prometheus-config-reloader:v3.11"
  Warning  Failed     3m               kubelet, ip-172-31-129-245.ec2.internal  Failed to pull image "registry.redhat.io/openshift3/ose-prometheus-config-reloader:v3.11": rpc error: code = Unknown desc = unable to retrieve auth token: invalid username/password
  Warning  Failed     3m               kubelet, ip-172-31-129-245.ec2.internal  Failed to pull image "registry.redhat.io/openshift3/prometheus:v3.11": rpc error: code = Unknown desc = unable to retrieve auth token: invalid username/password
  Normal   Pulled     3m               kubelet, ip-172-31-129-245.ec2.internal  Container image "registry.connect.redhat.com/bitnami/prometheus-blackbox-exporter:0.14.0-rhel-7-r33-2" already present on machine
  Normal   Created    3m               kubelet, ip-172-31-129-245.ec2.internal  Created container blackbox-exporter
  Normal   Started    3m               kubelet, ip-172-31-129-245.ec2.internal  Started container blackbox-exporter
  Normal   Pulling    3m               kubelet, ip-172-31-129-245.ec2.internal  Pulling image "registry.redhat.io/openshift3/oauth-proxy:v3.11.43"
  Warning  Failed     3m               kubelet, ip-172-31-129-245.ec2.internal  Failed to pull image "registry.redhat.io/openshift3/oauth-proxy:v3.11.43": rpc error: code = Unknown desc = unable to retrieve auth token: invalid username/password
  Normal   Pulling    3m               kubelet, ip-172-31-129-245.ec2.internal  Pulling image "registry.redhat.io/openshift3/ose-configmap-reloader:v3.11"
  Warning  Failed     3m               kubelet, ip-172-31-129-245.ec2.internal  Error: ErrImagePull
  Warning  Failed     3m               kubelet, ip-172-31-129-245.ec2.internal  Failed to pull image "registry.redhat.io/openshift3/ose-configmap-reloader:v3.11": rpc error: code = Unknown desc = unable to retrieve auth token: invalid username/password
  Warning  Failed     3m               kubelet, ip-172-31-129-245.ec2.internal  Error: ErrImagePull
  Normal   BackOff    3m               kubelet, ip-172-31-129-245.ec2.internal  Back-off pulling image "registry.redhat.io/openshift3/ose-configmap-reloader:v3.11"
  Warning  Failed     3m               kubelet, ip-172-31-129-245.ec2.internal  Error: ImagePullBackOff
  Normal   BackOff    3m               kubelet, ip-172-31-129-245.ec2.internal  Back-off pulling image "registry.redhat.io/openshift3/ose-prometheus-config-reloader:v3.11"
  Warning  Failed     3m               kubelet, ip-172-31-129-245.ec2.internal  Error: ImagePullBackOff
  Normal   BackOff    3m               kubelet, ip-172-31-129-245.ec2.internal  Back-off pulling image "registry.redhat.io/openshift3/oauth-proxy:v3.11.43"
  Warning  Failed     3m               kubelet, ip-172-31-129-245.ec2.internal  Error: ImagePullBackOff
  Normal   BackOff    3m (x2 over 3m)  kubelet, ip-172-31-129-245.ec2.internal  Back-off pulling image "registry.redhat.io/openshift3/prometheus:v3.11"
  Warning  Failed     3m (x2 over 3m)  kubelet, ip-172-31-129-245.ec2.internal  Error: ImagePullBackOff
david-martin commented 5 years ago

@mkralik3 Going to try reproduce this

david-martin commented 5 years ago

I was able to install OK on openshift 4.

I did see an error when trying to link a secret for one of the service accounts, but this was only temporary while the application-monitoring-operator was coming up and creating things.

oc secrets link grafana-operator mysecret --for=pull
Error from server (NotFound): serviceaccounts "grafana-operator" not found

The cmd worked after I retried it a few seconds later.

I wasn't able to login to any of the 3 services though (see below). I suspect this is a problem with the oauth proxy image being used not liking something in openshift 4. I'll look into this.

image

david-martin commented 5 years ago

Errors in proxy log:

2019/07/12 10:38:34 http.go:96: HTTPS: listening on [::]:9091
2019/07/12 10:39:59 server.go:2923: http: TLS handshake error from 10.128.2.24:43668: tls: first record does not look like a TLS handshake
2019/07/12 10:40:29 server.go:2923: http: TLS handshake error from 10.128.2.24:44244: tls: first record does not look like a TLS handshake
2019/07/12 10:40:59 server.go:2923: http: TLS handshake error from 10.128.2.24:44832: tls: first record does not look like a TLS handshake
2019/07/12 10:40:59 provider.go:386: authorizer reason: 
2019/07/12 10:41:01 provider.go:386: authorizer reason: 
2019/07/12 10:41:29 server.go:2923: http: TLS handshake error from 10.128.2.24:45368: tls: first record does not look like a TLS handshake
2019/07/12 10:41:32 provider.go:576: 404 GET https://oauth-openshift.apps.cluster-wat8-51d4.wat8-51d4.openshiftworkshop.com/apis/user.openshift.io/v1/users/~ {
  "paths": [
    "/apis",
    "/healthz",
    "/healthz/log",
    "/healthz/ping",
    "/healthz/poststarthook/clientCA-reload",
    "/healthz/poststarthook/oauth.openshift.io-startoauthclientsbootstrapping",
    "/healthz/poststarthook/requestheader-reload",
    "/metrics",
    "/readyz",
    "/readyz/log",
    "/readyz/ping",
    "/readyz/poststarthook/clientCA-reload",
    "/readyz/poststarthook/oauth.openshift.io-startoauthclientsbootstrapping",
    "/readyz/poststarthook/requestheader-reload",
    "/readyz/terminating"
  ]
}
2019/07/12 10:41:32 oauthproxy.go:635: error redeeming code (client:10.131.0.19:44806): unable to retrieve email address for user from token: got 404 {
  "paths": [
    "/apis",
    "/healthz",
    "/healthz/log",
    "/healthz/ping",
    "/healthz/poststarthook/clientCA-reload",
    "/healthz/poststarthook/oauth.openshift.io-startoauthclientsbootstrapping",
    "/healthz/poststarthook/requestheader-reload",
    "/metrics",
    "/readyz",
    "/readyz/log",
    "/readyz/ping",
    "/readyz/poststarthook/clientCA-reload",
    "/readyz/poststarthook/oauth.openshift.io-startoauthclientsbootstrapping",
    "/readyz/poststarthook/requestheader-reload",
    "/readyz/terminating"
  ]
}
2019/07/12 10:41:32 oauthproxy.go:434: ErrorPage 500 Internal Error Internal Error
david-martin commented 5 years ago

I changed the the oauth-proxy image to image: 'quay.io/openshift/origin-oauth-proxy:4.1' and login now works. I'll try this image on openshift 3 as well to see if it's backwards compatible. Otherwise, we'll need a branch in logic to for which proxy image to use based on the openshift version.

cc @pb82

david-martin commented 5 years ago

The v4.1 image seems to work on openshift 3

2019/07/12 15:34:35 provider.go:593: 200 GET https://172.30.0.1/apis/user.openshift.io/v1/users/~  {"kind":"User","apiVersion":"user.openshift.io/v1","metadata":{"name":"opentlc-mgr","selfLink":"/apis/user.openshift.io/v1/users/opentlc-mgr","uid":"2b6fac6f-a3b1-11e9-87f5-0a248efa40fe","resourceVersion":"398456","creationTimestamp":"2019-07-11T07:54:54Z"},"identities":["htpasswd_auth:opentlc-mgr"],"groups":["system:authenticated","system:authenticated:oauth"]}
  | 2019/07/12 15:34:35 provider.go:593: 201 POST https://172.30.0.1/apis/authorization.openshift.io/v1/subjectaccessreviews  {"kind":"SubjectAccessReviewResponse","apiVersion":"authorization.openshift.io/v1","allowed":true,"reason":"RBAC: allowed by ClusterRoleBinding \"cluster-admin-0\" of ClusterRole \"cluster-admin\" to User \"opentlc-mgr\""}
  | 2019/07/12 15:34:35 oauthproxy.go:676: 10.1.2.1:39966 authentication complete Session{opentlc-mgr@cluster.local token:true}
pb82 commented 5 years ago

@david-martin that would be preferable because i'm not sure if there's a good way to figure out the platform inside the operator pod.

heiko-braun commented 5 years ago

@david-martin Right, that's the same thing we needed to do for Fuse Online (https://github.com/integr8ly/application-monitoring-operator/issues/62#issuecomment-510928010)

david-martin commented 5 years ago

The change to later oauth proxy version will land in #65

mkralik3 commented 5 years ago

Now I have tried the latest application-monitoring-operator (0.0.20) and after changed to upstream images it works perfectly with OCP 4 too (I have tested it with my local OCP4.1 instance in the CRC ). I was able to see Fuse Online targets in the Prometheus and dashboards in the Grafana. Because of that, I have closed this issue.

image