vapor-ware / synse-charts

Helm charts for deploying the Synse project ⛵
https://charts.vapor.io
GNU General Public License v3.0
1 stars 0 forks source link

[synse-server] Needs to create a clusterrolebinding for plugin descovery #5

Closed marcoceppi closed 5 years ago

marcoceppi commented 5 years ago

On a fresh deploy of synse-server, logs will fail with tracebacks due to Forbidden errors from API

[2018-11-01 08:52:32 +0000] - (synse)[DEBUG] scan:31: Re-registering plugins
[2018-11-01 08:52:32 +0000] - (synse)[INFO] plugin:325: No plugin configurations for unix
[2018-11-01 08:52:32 +0000] - (synse)[DEBUG] plugin:327: unix plugin configuration: []
[2018-11-01 08:52:32 +0000] - (synse)[DEBUG] plugin:355: Registering plugins from default socket directory (/tmp/synse/procs)
[2018-11-01 08:52:32 +0000] - (synse)[INFO] plugin:380: Registered unix plugins: []
[2018-11-01 08:52:32 +0000] - (synse)[INFO] plugin:291: No plugin configurations for TCP
[2018-11-01 08:52:32 +0000] - (synse)[DEBUG] kubernetes:34: Using namespace "default" for k8s discovery
[2018-11-01 08:52:32 +0000] - (synse)[DEBUG] kubernetes:86: Using endpoint label selector: component=plugin
[2018-11-01 08:52:32 +0000] - (root)[ERROR] handlers:105: Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/sanic/app.py", line 603, in handle_request
response = await response
File "/usr/local/lib/python3.6/site-packages/synse/routes/core.py", line 55, in scan_route
response = await commands.scan(rack=rack, board=board, force=force)
File "/usr/local/lib/python3.6/site-packages/synse/commands/scan.py", line 32, in scan
plugin.register_plugins()
File "/usr/local/lib/python3.6/site-packages/synse/plugin.py", line 199, in register_plugins
addresses = kubernetes.discover()
File "/usr/local/lib/python3.6/site-packages/synse/discovery/kubernetes.py", line 43, in discover
addresses.extend(_register_from_endpoints(ns=ns, cfg=endpoints_cfg))
File "/usr/local/lib/python3.6/site-packages/synse/discovery/kubernetes.py", line 93, in _register_from_endpoints
endpoints = v1.list_namespaced_endpoints(namespace=ns, label_selector=label_selector)
File "/usr/local/lib/python3.6/site-packages/kubernetes/client/apis/core_v1_api.py", line 11782, in list_namespaced_endpoints
(data) = self.list_namespaced_endpoints_with_http_info(namespace, **kwargs)
File "/usr/local/lib/python3.6/site-packages/kubernetes/client/apis/core_v1_api.py", line 11885, in list_namespaced_endpoints_with_http_info
collection_formats=collection_formats)
File "/usr/local/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 321, in call_api
_return_http_data_only, collection_formats, _preload_content, _request_timeout)
File "/usr/local/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 155, in __call_api
_request_timeout=_request_timeout)
File "/usr/local/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 342, in request
headers=headers)
File "/usr/local/lib/python3.6/site-packages/kubernetes/client/rest.py", line 231, in GET
query_params=query_params)
File "/usr/local/lib/python3.6/site-packages/kubernetes/client/rest.py", line 222, in request
raise ApiException(http_resp=r)
kubernetes.client.rest.ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'Date': 'Thu, 01 Nov 2018 08:52:32 GMT', 'Content-Length': '266'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"endpoints is forbidden: User \"system:serviceaccount:default:default\" cannot list endpoints in the namespace \"default\"","reason":"Forbidden","details":{"kind":"endpoints"},"code":403}

Since synse-server now uses Kubernetes API for plugin discovery, the chart will need to both create a service-account and a clusterrolebinding for read level permissions. Normally, a Role and RoleBinding would be sufficient, but since synse-server can technically be configured to search for endpoints not in their namespace, we'll use a ClusterRole and ClusterRoleBinding

I've not tested these templates, but it would look something like

serviceaccount.yaml

apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
{{ include "labels" . | indent 4 }}
  name: {{ template "fullname" . }}

clusterrole.yaml

Since the URL being accessed is /api/v1/namespaces/{namespace}/endpoints we can decompose that into apiGroups "core", in one or more namespaces, for the endpoints resource. The method being invoked is list so we need at least the list and get verbs. I've added watch, because it's easy enough to include.

kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: {{ template "fullname" . }}
rules:
- apiGroups: [""]                 # Empty string means the Core API group
  resources: ["endpoints"]
  verbs: ["get", "watch", "list"]

clusterrolebinding.yaml

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: {{ template "fullname" . }}
  labels:
{{ include "labels" . | indent 4 }}
  annotations:
{{ include "annotations" . | indent 4 }}
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: {{ template "fullname" . }}
subjects:
- kind: ServiceAccount
  name: {{ template "fullname" . }}
  namespace: {{ .Release.Namespace }}

deployment.yaml

Finally, the Deployment template spec will need to be updated to attach the ServiceAccount to synse-server.

Under spec.template.spec in the deployment add

serviceAccountName: {{ template "fullname" . }}

We should do our best to use fullname for these item to avoid collisions when multiple releases deployed in a single cluster / namespace.

edaniszewski commented 5 years ago

I didn't realize that you needed read permissions to get api objects within the same namespace. I'm sure that I tried this out before with just Synse Server and a plugin on a GKE cluster and it was working before. Maybe the ServiceAccount from blackbox was still lingering or there are some non-default permissions set when the cluster is created.. either way, this is my bad. Thanks for the guide and templates -- I'll get something up for this shortly.

marcoceppi commented 5 years ago

It's highly probable that the GKE cluster may not of had RBAC enabled, which would skew you're results

MatthewHink commented 5 years ago

I don't have the full terminal window output, but based on my notes I ran the rbac script:

# helm
kubectl apply -f management/setup/helm/rbac.yaml
helm init --service-account tiller
marcoceppi commented 5 years ago

Yes, that's to set up helm, this is for the services themselves. That yaml creates a service account for helms tiller - though that's not shared with helm deployed services

edaniszewski commented 5 years ago

It does seem like it was a GKE issue. I tested Synse Server + Plugin both with and without the changes that add in the ServiceAccount, ClusterRole, and ClusterRoleBinding and it worked fine in both cases. I wasn't able to replicate the permissions errors seen in Synse Server, which is probably why I missed this before.

marcoceppi commented 5 years ago

Yeah, we should set up a cluster in GCP that isn't GKE so we can run infra similar to our production sites. I'll work on that separately

On Thu, Nov 1, 2018, 11:11 Erick Daniszewski notifications@github.com wrote:

It does seem like it was a GKE issue. I tested Synse Server + Plugin both with and without the changes that add in the ServiceAccount, ClusterRole, and ClusterRoleBinding and it worked fine in both cases. I wasn't able to replicate the permissions errors seen in Synse Server, which is probably why I missed this before.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/vapor-ware/synse-charts/issues/5#issuecomment-435065932, or mute the thread https://github.com/notifications/unsubscribe-auth/AAET1a5wtW4VVomyyA0h1m1xLCEGm3JJks5uqw8YgaJpZM4YGHYk .

MatthewHink commented 5 years ago

This is blocking VEM master deploy, which is the only site left now.