kubernetes-sigs / controller-runtime

Repo for the controller-runtime subproject of kubebuilder (sig-apimachinery)
Apache License 2.0
2.45k stars 1.13k forks source link

ManagerOptions#CertDir default is confusing #900

Open thephw opened 4 years ago

thephw commented 4 years ago

Problem Description

The default for the CertDir configuration option is presently nonsensical. I expect because it is out of date with other changes in the expected workflow for a developer to configure TLS. A prior PR #300 removed the cert provisioners that would create local credentials at {TempDir}/k8s-webhook-server/serving-certs/tls.key and {TempDir}/k8s-webhook-server/serving-certs/tls.crt This default was likely missed in the refactor due to some unexpected tight coupling to the prior implementation. However, it creates some confusion for a new developer when trying to run the examples.

CertDir has the comment

// CertDir is the directory that contains the server key and certificate.
    // if not set, webhook server would look up the server key and certificate in
    // {TempDir}/k8s-webhook-server/serving-certs. The server key and certificate
    // must be named tls.key and tls.crt, respectively.
    CertDir string

In the logs running the example you will get:

unable to run manager   {"error": "open /var/folders/kc/7wscczc57v15s84cy1nx0d3h0000gn/T/k8s-webhook-server/serving-certs/tls.crt: no such file or directory"}

Developer Experience

A new developer is likely to run through the examples in the repository. This is what I was doing. The CertDir default sort of served as a red herring, masking the problem for awhile. The default value is so specific it seemed like something else was broken and the examples should work as written. However, in the current implementation it seems they require additional configuration.

Possible Solution

Would love some feedback on the proper updates, but my inclination is to:

  1. Remove the outdated default for CertDir
  2. Update the comment on the type definition
  3. Add a helpful logging message on receiving a nil value for CertDir
  4. Update the examples to setup and reference local certs using mkcert
  5. (maybe?) Change the signature for manager.New to reflect that option is not optional

Additional Context

$ kc version
Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.3", GitCommit:"b3cbbae08ec52a7fc73d334838e18d17e8512749", GitTreeState:"clean", BuildDate:"2019-11-14T04:24:29Z", GoVersion:"go1.12.13", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.5", GitCommit:"20c265fef0741dd71a66480e35bd69f18351daea", GitTreeState:"clean", BuildDate:"2019-10-15T19:07:57Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}
$ go version
go version go1.13.8 darwin/amd64

Related Prior Contributions

@mengqiy authored PR #300 and it was reviewed by @droot and @DirectXMan12 and all of them may likely have superior context to the past, present, and future state. Would love to have y'alls input and thank you for your contributions ❤️

DirectXMan12 commented 4 years ago

Can I ask which OS you're using here? On most Linux distros, $TMPDIR is just /tmp (we shouldn't rely on this, though).

If we're gonna change this, we need to do it carefully, because it'd probably break a lot of folks. It's not really a problem on most linux installs, because $TMPDIR is /tmp. That's not to say we shouldn't change it to something a bit more sensible, though.

At the very least, updating the comment seems reasonable, and wrapping the error message with a helpful unable to read certs -- did you put the certs in Manager.CertDir: %w would probably be good too.

Change the signature for manager.New to reflect that option is not optional

It should be -- a good general pattern for stuff that runs in containers is to choose a constant, reasonable default. Since you control the fs, you can just always mount there. The main problem here is that $TMPDIR isn't actually constant on all systems.

DirectXMan12 commented 4 years ago

/kind bug /priority important-longterm

DirectXMan12 commented 4 years ago

/good-first-issue

on the docs and improved error message

k8s-ci-robot commented 4 years ago

@DirectXMan12: This request has been marked as suitable for new contributors.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-good-first-issue command.

In response to [this](https://github.com/kubernetes-sigs/controller-runtime/issues/900): >/good-first-issue > >on the docs and improved error message Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
thephw commented 4 years ago

I was running the code locally on MacOS Catalina 10.15.4 (19E287). Good point on the reasonable default. What would be the new reasonable default in the current implementation?

thephw commented 4 years ago

This would probably also be helpful:

$ mktemp
/var/folders/4g/3dkqhpkd4csgw5nqc6wzcb900000gn/T/tmp.V2TLDwmC
mengqiy commented 4 years ago

https://github.com/kubernetes-sigs/controller-runtime/blob/1c83ff6f06bc764c95dd69b0f743740c064c4bf6/pkg/webhook/server.go#L92 This is where the defaulting happens. It relies on os.TempDir to get the temp dir on different OS.

IIRC CertDir is still an optional field, since if you don't need to run webhook at all, you don't need to set it.

I'm more leaning toward updating the comment to point out default directory is determined by os.TempDir

DirectXMan12 commented 4 years ago

@mengqiy I think the main problem is that this might be autogenerated/ard to get at -- e.g. on OSX. That said, I don't think we can fix this easily w/o serious breakage, except maybe if we OS-detect OSX and special-case that for the moment.

thephw commented 4 years ago

Not that we're getting around to integration testing with kind I am getting the same error in linux land with the default. @DirectXMan12 can you point me to the code that sets up the default certs at /tmp/k8s-webhook-server/serving-certs?


{"level":"error","ts":1589463131.8074567,"logger":"build-webhooks.entrypoint","msg":"Unable to run manager","error":"open /tmp/k8s-webhook-server/serving-certs/tls.crt: no such file or directory","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/layers/heroku_go/shim/go-path/pkg/mod/github.com/go-logr/zapr@v0.1.0/zapr.go:128\ngithub.com/onspaceship/booster/pkg/webhooks.Start\n\t/workspace/pkg/webhooks/webhooks.go:81"}```
cshivashankar commented 4 years ago

How to overcome this? I think it would be good to have something work for the first-time implementation or maybe a bit more information in Readme will definitely help. As a workaround should i generate certs validated by API server and create a separate service for this or can it be done through the code itself to generate the cert?

fejta-bot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot commented 3 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten

DirectXMan12 commented 3 years ago

/remove-lifecycle rotten /lifecycle frozen