Closed alrf closed 3 years ago
I've just additionally checked: everything works fine with v3.9.0.
@alrf Are you using the preferService
option in the Grafana CR? There was a small change introduced in 3.10.0 with regards to that. It will now rely on the service name instead of the IP address.
@pb82 yes, I have this setting in the Grafana CR:
client:
preferService: True
@alrf As a workaround, can you set preferService
to false and let the Operator create a Route/Ingress? Does that work? I'll try to reproduce the issue you're having. Can you curl the hostname of the Grafana service from the operator pod? If not it could be a networking issue on your cluster.
@pb82 I've tried preferService: False
with Operator v3.10.0 and v3.10.1, the same result.
{"level":"error","ts":1621512787.8417025,"logger":"controller_grafanadashboard","msg":"error updating dashboard","error":"error creating folder, expected status 200 but got 403","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/Users/briangallagher/go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\ngithub.com/integr8ly/grafana-operator/v3/pkg/controller/grafanadashboard.(*ReconcileGrafanaDashboard).manageError\n\tgrafana-operator/pkg/controller/grafanadashboard/dashboard_controller.go:366\ngithub.com/integr8ly/grafana-operator/v3/pkg/controller/grafanadashboard.(*ReconcileGrafanaDashboard).reconcileDashboards\n\tgrafana-operator/pkg/controller/grafanadashboard/dashboard_controller.go:254\ngithub.com/integr8ly/grafana-operator/v3/pkg/controller/grafanadashboard.(*ReconcileGrafanaDashboard).Reconcile\n\tgrafana-operator/pkg/controller/grafanadashboard/dashboard_controller.go:136\ngithub.com/integr8ly/grafana-operator/v3/pkg/controller/grafanadashboard.add.func1\n\tgrafana-operator/pkg/controller/grafanadashboard/dashboard_controller.go:86\ngithub.com/integr8ly/grafana-operator/v3/pkg/controller/grafanadashboard.add.func2\n\tgrafana-operator/pkg/controller/grafanadashboard/dashboard_controller.go:92"}
{"level":"error","ts":1621512787.855196,"logger":"controller_grafanadashboard","msg":"failed to get or create namespace folder Dev for dashboard ","error":"error creating folder, expected status 200 but got 403","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/Users/briangallagher/go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\ngithub.com/integr8ly/grafana-operator/v3/pkg/controller/grafanadashboard.(*ReconcileGrafanaDashboard).reconcileDashboards\n\tgrafana-operator/pkg/controller/grafanadashboard/dashboard_controller.go:253\ngithub.com/integr8ly/grafana-operator/v3/pkg/controller/grafanadashboard.(*ReconcileGrafanaDashboard).Reconcile\n\tgrafana-operator/pkg/controller/grafanadashboard/dashboard_controller.go:136\ngithub.com/integr8ly/grafana-operator/v3/pkg/controller/grafanadashboard.add.func1\n\tgrafana-operator/pkg/controller/grafanadashboard/dashboard_controller.go:86\ngithub.com/integr8ly/grafana-operator/v3/pkg/controller/grafanadashboard.add.func2\n\tgrafana-operator/pkg/controller/grafanadashboard/dashboard_controller.go:92"}
Part of config:
service:
ports:
- name: grafana-proxy
port: 9091
protocol: TCP
targetPort: grafana-proxy
annotations:
service.alpha.openshift.io/serving-cert-secret-name: grafana-k8s-tls
ingress:
enabled: True
targetPort: grafana-proxy
termination: reencrypt
hostname: grafana.replaced-with-my-domain.com
client:
preferService: False
Can't connect via curl from the operator pod to port 3000:
bash-4.4$ curl http://grafana-service.grafana-operator.svc.cluster.local:3000
^C
but can connect to port 9091:
bash-4.4$ curl https://grafana-service.grafana-operator.svc.cluster.local:9091 -sk
<!DOCTYPE html>
<html lang="en" charset="utf-8">
<head>
<title>Log In</title>
Could it be because of grafana-proxy using? (the operator is deployed on Openshift).
p.s.: also tried to curl using preferService: True
, the same result.
Hi @pb82 I've checked and compared v3.9.0 and v3.10.0 tags and found:
grafana-operator]$ git diff v3.9.0 v3.10.0 -- pkg/controller/grafana/grafana_controller.go
diff --git a/pkg/controller/grafana/grafana_controller.go b/pkg/controller/grafana/grafana_controller.go
index 42f58307..c81c030c 100644
--- a/pkg/controller/grafana/grafana_controller.go
+++ b/pkg/controller/grafana/grafana_controller.go
@@ -249,10 +249,7 @@ func (r *ReconcileGrafana) getGrafanaAdminUrl(cr *grafanav1alpha1.Grafana, state
var servicePort = int32(model.GetGrafanaPort(cr))
// Otherwise rely on the service
- if state.GrafanaService != nil && state.GrafanaService.Spec.ClusterIP != "" && state.GrafanaService.Spec.ClusterIP != "None" {
- return fmt.Sprintf("http://%v:%d", state.GrafanaService.Spec.ClusterIP,
- servicePort), nil
- } else if state.GrafanaService != nil {
+ if state.GrafanaService != nil {
return fmt.Sprintf("http://%v:%d", state.GrafanaService.Name,
servicePort), nil
}
So, before it worked through the IP address (ClusterIP
) and now through the GrafanaService.Name
.
But to request a service in kubernetes you need to use http://<service_name>.<namespace>.svc.<zone>:<port>
(e.g. http://grafana-service.grafana-operator.svc.cluster.local:3000
) and not just a service name.
I guess it can be an issue in this case.
https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#services
https://github.com/kubernetes/dns/blob/master/docs/specification.md
@alrf Port 3000 should always be accessible without authentication, even if you use the OAuth proxy. Usually the oauth proxy is set up by adding another port to the service and then changing the route to point to the proxy port. We have an exampel for that setup: https://github.com/integr8ly/grafana-operator/blob/master/deploy/examples/oauth/Grafana.yaml
Does your service.ports
setup look similar? When using this example, the service should have the following ports:
spec:
ports:
- name: grafana
protocol: TCP
port: 3000
targetPort: grafana-http
- name: grafana-proxy
protocol: TCP
port: 9091
targetPort: grafana-proxy
Does your service.ports setup look similar? When using this example, the service should have the following ports
yes, like it is in examples (https://github.com/integr8ly/grafana-operator/blob/master/deploy/examples/oauth/Grafana.yaml)
@pb82 any updates?
@alrf there is a change incoming where we fix the service DNS name: #438
Once that's landed, i'll reach out and we can give it another try.
@pb82 in v3.9.0 I can reach grafana-service from operator pod by name:
bash-4.4$ curl -I http://grafana-service:3000
HTTP/1.1 200 OK
Cache-Control: no-cache
Content-Type: text/html; charset=UTF-8
Expires: -1
Pragma: no-cache
X-Content-Type-Options: nosniff
X-Frame-Options: deny
X-Xss-Protection: 1; mode=block
Date: Wed, 23 Jun 2021 15:37:59 GMT
So, maybe DNS name is unrelated..
Another option for v3.10 could be using try/catch
(on try - request grafana-service by name, on catch - by service IP address).
@alrf can you please try 3.10.2? The issue should be fixed there.
@pb82 I see only v3.10.1 available (OpenShift Operator)
https://operatorhub.io/operator/grafana-operator have release v3.10.2 now. You should be able to find it in OLM as well.
http://i.imgur.com/Kuh8DL9.png I've tried on a few OpenShift clusters, everywhere v3.10.1 as latest available version.
@pb82 OK, now v3.10.2 is available. I've upgraded operator, but the issue stayed the same:
Get "http://admin:***@grafana-service:3000/api/folders": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
bunch of these errors in cluster console, I can't reach grafana-service from operator pod by name:
$ oc describe pod/grafana-operator-d585f98c6-85t94 -n grafana-operator | grep '3.10'
containerImage: quay.io/integreatly/grafana-operator:v3.10.2
Image: quay.io/integreatly/grafana-operator:v3.10.2
OPERATOR_CONDITION_NAME: grafana-operator.v3.10.2
Normal Pulling 48s kubelet Pulling image "quay.io/integreatly/grafana-operator:v3.10.2"
Normal Pulled 39s kubelet Successfully pulled image "quay.io/integreatly/grafana-operator:v3.10.2" in 9.935066938s
$ oc exec -it pod/grafana-operator-d585f98c6-85t94 -n grafana-operator -- bash
bash-4.4$ curl -I http://grafana-service:3000
^C
bash-4.4$
As I said here before, DNS name is unrelated..
I can't reach service even via IP address:
$ oc exec -it pod/grafana-operator-d585f98c6-85t94 -n grafana-operator -- bash
bash-4.4$ curl -I http://172.30.247.230:3000
^C
bash-4.4$ curl -I http://172.30.247.230:9091
HTTP/1.0 400 Bad Request
bash-4.4$
Again, everything works with v3.9.0, no changes on cluster side.
@pb82 I found something interesting: "default" example for Grafana deployment (coming as default with Operator) works as expected - I can reach port 3000 from operator pod in this case.
Here my config for Grafana (using it I can't reach port 3000):
apiVersion: v1
data:
session_secret: <MY_SECRET_HERE>
kind: Secret
metadata:
name: grafana-k8s-proxy
namespace: grafana-operator
type: Opaque
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: grafana-proxy
namespace: grafana-operator
rules:
- apiGroups:
- authentication.k8s.io
resources:
- tokenreviews
verbs:
- create
- apiGroups:
- authorization.k8s.io
resources:
- subjectaccessreviews
verbs:
- create
---
apiVersion: authorization.openshift.io/v1
kind: ClusterRoleBinding
metadata:
name: grafana-proxy
roleRef:
name: grafana-proxy
subjects:
- kind: ServiceAccount
name: grafana-serviceaccount
namespace: grafana-operator
userNames:
- system:serviceaccount:grafana-operator:grafana-serviceaccount
---
apiVersion: v1
kind: ConfigMap
metadata:
labels:
config.openshift.io/inject-trusted-cabundle: "true"
name: ocp-injected-certs
namespace: grafana-operator
---
apiVersion: integreatly.org/v1alpha1
kind: Grafana
metadata:
name: grafana-oauth
namespace: grafana-operator
spec:
baseImage: <MY_IMAGE_REGISTRY>/grafana:7.3.7
config:
log:
mode: "console"
level: "warn"
auth:
disable_login_form: False
disable_signout_menu: True
auth.basic:
enabled: True
auth.anonymous:
enabled: True
org_role: Admin
deployment:
securityContext:
fsGroup: 472
hostNetwork: true
nodeSelector:
node-role.kubernetes.io/infra: ""
tolerations:
- key: "infra"
operator: "Exists"
effect: "NoExecute"
containers:
- name: grafana-proxy
args:
- '-provider=openshift'
- '-pass-basic-auth=false'
- '-https-address=:9091'
- '-http-address='
- '-email-domain=*'
- '-upstream=http://localhost:3000'
- '-openshift-sar={"resource": "namespaces", "verb": "get"}'
- '-openshift-delegate-urls={"/": {"resource": "namespaces", "verb": "get"}}'
- '-tls-cert=/etc/tls/private/tls.crt'
- '-tls-key=/etc/tls/private/tls.key'
- '-client-secret-file=/var/run/secrets/kubernetes.io/serviceaccount/token'
- '-cookie-secret-file=/etc/proxy/secrets/session_secret'
- '-openshift-service-account=grafana-serviceaccount'
- '-openshift-ca=/etc/pki/tls/cert.pem'
- '-openshift-ca=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt'
- '-openshift-ca=/etc/grafana-configmaps/ocp-injected-certs/ca-bundle.crt'
- '-skip-auth-regex=^/metrics'
image: 'quay.io/openshift/origin-oauth-proxy:4.6'
ports:
- containerPort: 9091
name: grafana-proxy
resources: {}
volumeMounts:
- mountPath: /etc/tls/private
name: secret-grafana-k8s-tls
readOnly: false
- mountPath: /etc/proxy/secrets
name: secret-grafana-k8s-proxy
readOnly: false
dataStorage:
class: gp3
accessModes:
- ReadWriteOnce
size: 10Gi
secrets:
- grafana-k8s-tls
- grafana-k8s-proxy
configMaps:
- ocp-injected-certs
service:
ports:
- name: grafana-proxy
port: 9091
protocol: TCP
targetPort: grafana-proxy
annotations:
service.alpha.openshift.io/serving-cert-secret-name: grafana-k8s-tls
ingress:
enabled: True
targetPort: grafana-proxy
termination: reencrypt
hostname: <MY_GRAFANA_URL_HERE>
client:
preferService: True
serviceAccount:
annotations:
serviceaccounts.openshift.io/oauth-redirectreference.primary: '{"kind":"OAuthRedirectReference","apiVersion":"v1","reference":{"kind":"Route","name":"grafana-route"}}'
dashboardLabelSelector:
- matchExpressions:
- { key: "app", operator: In, values: ['grafana'] }
p.s. I've tried both: preferService: True
and preferService: False
Finally, I found an issue, it was Security Group for Openshift worker nodes. Port 3000 should be opened in case of hostNetwork: true
usage and I don't know why it worked before with Operator v3.9.0 (probably because of changes introduced for preferService
in v3.10.X).
Thank you for your help.
Describe the bug I can't create dashboards after updating to Operator v3.10.0.
The errors are: