kinvolk / lokomotive

🪦 DISCONTINUED Further Lokomotive development has been discontinued. Lokomotive is a 100% open-source, easy to use and secure Kubernetes distribution from the volks at Kinvolk
https://kinvolk.io/lokomotive-kubernetes/
Apache License 2.0
320 stars 49 forks source link

monitoring etcd #252

Closed surajssd closed 4 years ago

surajssd commented 4 years ago

Description

We can reach the etcd cluster from within the cluster. But to monitor etcd we need certificates to authenticate with the etcd cluster. We can use the certificates(etcd-client-ca.crt, etcd-client.crt, etcd-client.key) generated for apiserver which are available in kube-system namespace secret called kube-apiserver. But providing those certs to the prometheus operator will be counter productive to security.

Ideally we should create certs for the metrics/monitoring user and then create such user in etcd and grant is appropriate permissions.

TODOs

## Component scraping etcd
##
kubeEtcd:
  enabled: true

  ## If your etcd is not deployed as a pod, specify IPs it can be found on
  ##
  endpoints: []
  # - 10.141.4.22
  # - 10.141.4.23
  # - 10.141.4.24

  ## Etcd service. If using kubeEtcd.endpoints only the port and targetPort are used
  ##
  service:
    port: 2379
    targetPort: 2379
    # selector:
    #   component: etcd

  ## Configure secure access to the etcd cluster by loading a secret into prometheus and
  ## specifying security configuration below. For example, with a secret named etcd-client-cert
  ##
  ## serviceMonitor:
  ##   scheme: https
  ##   insecureSkipVerify: false
  ##   serverName: localhost
  ##   caFile: /etc/prometheus/secrets/etcd-client-cert/etcd-ca
  ##   certFile: /etc/prometheus/secrets/etcd-client-cert/etcd-client
  ##   keyFile: /etc/prometheus/secrets/etcd-client-cert/etcd-client-key
  ##
  serviceMonitor:
    ## Scrape interval. If not set, the Prometheus default scrape interval is used.
    ##
    interval: ""
    scheme: http
    insecureSkipVerify: false
    serverName: ""
    caFile: ""
    certFile: ""
    keyFile: ""

    ##  metric relabel configs to apply to samples before ingestion.
    ##
    metricRelabelings: []
    # - action: keep
    #   regex: 'kube_(daemonset|deployment|pod|namespace|node|statefulset).+'
    #   sourceLabels: [__name__]

    #   relabel configs to apply to samples before ingestion.
    ##
    relabelings: []
    # - sourceLabels: [__meta_kubernetes_pod_node_name]
    #   separator: ;
    #   regex: ^(.*)$
    #   targetLabel: nodename
    #   replacement: $1
    #   action: replace

Possible implementations

invidian commented 4 years ago

And where does the code for cert-creation

This could be done with Terraform.

etcd user creation lie

This is interesting, as once we enable RBAC on etcd, we need to do it also for kube-apiserver, so either we have a K8s controller running as static pod which will do that or we do it some other way. But with the controller, it would be nice to have it done with CRD etc.

Given that etcd is not exposed externally, we need to do that from inside the cluster network. We could perhaps have some simple Go program, which will ensure, that required users etc exists as part of bootstrapping process. But this approach won't allow any updates to it. The updates though, could be perhaps rolled using K8s controller then.

surajssd commented 4 years ago

Given that etcd is not exposed externally, we need to do that from inside the cluster network. We could perhaps have some simple Go program, which will ensure, that required users etc exists as part of bootstrapping process. But this approach won't allow any updates to it. The updates though, could be perhaps rolled using K8s controller then.

How is this exposed to user? Using a component or a special lokoctl sub-command?

invidian commented 4 years ago

How is this exposed to user? Using a component or a special lokoctl sub-command?

I think it would be part of the controlplan then, and not available for the user.

surajssd commented 4 years ago

Does that mean it is okay to create monitoring namespace before deploying prometheus operator? Because the secret has to be in the same namespace as the prometheus-operator.

That also means we are not offering the flexibility for the user to choose the namespace name for prometheus operator?

Also if user inadvertently deletes the namespace then this secret is also gone. How do we make it available again?

invidian commented 4 years ago

I think the workflow should be the following:

  1. Terraform creates all the certificates
  2. Bootkube starts bootstrap manifests, including "etcd bootstrapper" (to be defined)
  3. etcd client certificate for Prometheus is stored in Terraform output.
  4. When one installs prometheus-operator, the component would look up Terraform output to obtain the certificates and pass them via values.yaml to Helm (we don't have this functionality yet, but it would probably be helpful to have such).
  5. If one wants to update RBAC for etcd, custom resource needs to be created (also to be defined + operator).

Notes:

But first thing first, we should do manual tests to see if everything actually works as we expect.

invidian commented 4 years ago

Created #269 to track just RBAC separately, so we limit scope of this issue to just enabling the monitoring and configuring RBAC to allow it.

surajssd commented 4 years ago

Let me create an issue that does whatever is proposed in this issue manually and then we can build automation on top of this?