monitoring etcd - Githubissues

surajssd commented 4 years ago

Description

We can reach the etcd cluster from within the cluster. But to monitor etcd we need certificates to authenticate with the etcd cluster. We can use the certificates(etcd-client-ca.crt, etcd-client.crt, etcd-client.key) generated for apiserver which are available in kube-system namespace secret called kube-apiserver. But providing those certs to the prometheus operator will be counter productive to security.

Ideally we should create certs for the metrics/monitoring user and then create such user in etcd and grant is appropriate permissions.

TODOs

[ ] Generate certificates for a monitoring user. Get some help on how to do that from here: https://docs.projectcalico.org/reference/etcd-rbac/certificate-generation
[ ] Create user on etcd https://etcd.io/docs/v3.4.0/op-guide/authentication/
- This should happen after cluster installation is done. (Note: Lokoctl should have a post installation hooks kinda mechanism where this can hook into. But this is out of scope of this PR.)
[ ] Grant permissions so that it can only access metrics and nothing else.
[ ] Make the etcd user certs available in a secret that can be then consumed by prometheus operator.
- This can only happen when prometheus-operator is installed. Because it has to be available in monitoring namespace OR in the same namespace as prometheus-operator.

## Component scraping etcd
##
kubeEtcd:
  enabled: true

  ## If your etcd is not deployed as a pod, specify IPs it can be found on
  ##
  endpoints: []
  # - 10.141.4.22
  # - 10.141.4.23
  # - 10.141.4.24

  ## Etcd service. If using kubeEtcd.endpoints only the port and targetPort are used
  ##
  service:
    port: 2379
    targetPort: 2379
    # selector:
    #   component: etcd

  ## Configure secure access to the etcd cluster by loading a secret into prometheus and
  ## specifying security configuration below. For example, with a secret named etcd-client-cert
  ##
  ## serviceMonitor:
  ##   scheme: https
  ##   insecureSkipVerify: false
  ##   serverName: localhost
  ##   caFile: /etc/prometheus/secrets/etcd-client-cert/etcd-ca
  ##   certFile: /etc/prometheus/secrets/etcd-client-cert/etcd-client
  ##   keyFile: /etc/prometheus/secrets/etcd-client-cert/etcd-client-key
  ##
  serviceMonitor:
    ## Scrape interval. If not set, the Prometheus default scrape interval is used.
    ##
    interval: ""
    scheme: http
    insecureSkipVerify: false
    serverName: ""
    caFile: ""
    certFile: ""
    keyFile: ""

    ##  metric relabel configs to apply to samples before ingestion.
    ##
    metricRelabelings: []
    # - action: keep
    #   regex: 'kube_(daemonset|deployment|pod|namespace|node|statefulset).+'
    #   sourceLabels: [__name__]

    #   relabel configs to apply to samples before ingestion.
    ##
    relabelings: []
    # - sourceLabels: [__meta_kubernetes_pod_node_name]
    #   separator: ;
    #   regex: ^(.*)$
    #   targetLabel: nodename
    #   replacement: $1
    #   action: replace

Possible implementations

Manually add a secret in upstream prometheus operator and templatise the secret. This secret and changes to the values file we will have to carry forward on each update. The values of secret can be accepted from the component. But the complication here is that we don't have a way to pass values around in lokocfg or access values within it. And where does the code for cert-creation, etcd user creation lie?
Create a component which installs the configs like secret. This allows us to run it after prometheus-operator is installed. But then creating a new component for just secret can feel like an overkill. And where does the code for cert-creation, etcd user creation lie? Do we create a application which does this within the cluster?
Create a sub-command in lokoctl which does all of the above tasks. Like creating user, generating certs, creating secret out of it, etc. This is better option because user can run it anytime. But then this is not easy to achieve because etcd is available only inside the cluster.

invidian commented 4 years ago

And where does the code for cert-creation

This could be done with Terraform.

etcd user creation lie

This is interesting, as once we enable RBAC on etcd, we need to do it also for kube-apiserver, so either we have a K8s controller running as static pod which will do that or we do it some other way. But with the controller, it would be nice to have it done with CRD etc.

Given that etcd is not exposed externally, we need to do that from inside the cluster network. We could perhaps have some simple Go program, which will ensure, that required users etc exists as part of bootstrapping process. But this approach won't allow any updates to it. The updates though, could be perhaps rolled using K8s controller then.

surajssd commented 4 years ago

Given that etcd is not exposed externally, we need to do that from inside the cluster network. We could perhaps have some simple Go program, which will ensure, that required users etc exists as part of bootstrapping process. But this approach won't allow any updates to it. The updates though, could be perhaps rolled using K8s controller then.

How is this exposed to user? Using a component or a special lokoctl sub-command?

invidian commented 4 years ago

How is this exposed to user? Using a component or a special lokoctl sub-command?

I think it would be part of the controlplan then, and not available for the user.

surajssd commented 4 years ago

Does that mean it is okay to create monitoring namespace before deploying prometheus operator? Because the secret has to be in the same namespace as the prometheus-operator.

That also means we are not offering the flexibility for the user to choose the namespace name for prometheus operator?

Also if user inadvertently deletes the namespace then this secret is also gone. How do we make it available again?

invidian commented 4 years ago

I think the workflow should be the following:

Terraform creates all the certificates
Bootkube starts bootstrap manifests, including "etcd bootstrapper" (to be defined)
etcd client certificate for Prometheus is stored in Terraform output.
When one installs prometheus-operator, the component would look up Terraform output to obtain the certificates and pass them via values.yaml to Helm (we don't have this functionality yet, but it would probably be helpful to have such).
If one wants to update RBAC for etcd, custom resource needs to be created (also to be defined + operator).

Notes:

Bootstrapper and operator should probably share the codebase.

But first thing first, we should do manual tests to see if everything actually works as we expect.

invidian commented 4 years ago

Created #269 to track just RBAC separately, so we limit scope of this issue to just enabling the monitoring and configuring RBAC to allow it.

surajssd commented 4 years ago

Let me create an issue that does whatever is proposed in this issue manually and then we can build automation on top of this?

kinvolk / lokomotive

monitoring etcd #252

Description

TODOs

Possible implementations