servicebinding / spec

Specification for binding services to k8s workloads
https://servicebinding.io
Apache License 2.0
92 stars 35 forks source link

Support for binding expiration/rotation and notification of workloads ? #228

Open gberche-orange opened 4 months ago

gberche-orange commented 4 months ago

I'd like to learn how the service binding project suggest to handle binding credentials rotations.

In particular:

As an inspiration, here is how the open service broker api supports binding rotations:

OSB specs extract https://github.com/openservicebrokerapi/servicebroker/blob/master/spec.md#binding-metadata-object > >
> > Response Field | Type | Description > -- | -- | -- > expires_at | string | The date and time when the Service Binding becomes invalid and SHOULD NOT or CANNOT be used anymore. If present, the string MUST follow ISO 8601 and this pattern: yyyy-mm-ddThh:mm:ss.sZ. > renew_before | string | The date and time before the Service Binding SHOULD be renewed. Applications or Platforms MAY use this field to initiate a Service Binding rotation or create a new Service Binding on time. It is RECOMMENDED to trigger the creation of a new Service Binding shortly before this timestamp. If the expires_at field is also present, the renew_before timestamp MUST be before or equal to the expires_at timestamp. Service Brokers SHOULD leave enough time between both timestamps to create a new Service Binding including a buffer to enable continuity. If present, the string MUST follow ISO 8601 and this pattern: yyyy-mm-ddThh:mm:ss.sZ. > > > > https://github.com/openservicebrokerapi/servicebroker/blob/master/spec.md#binding-rotation > Some Service Bindings are not valid forever. Especially credentials expire and have to be replaced at some point in time. The simplest form of exchanging a binding is to create a new Service Binding, make it available to the Application and remove and unbind the old one. In many cases, this requires a restart of the Application. > > But this approach has a few downsides. First of all, from the Service Broker point of view, there is no continuity. The Service Broker doesn't know that the new binding is the successor of the old one. If state is attached to the old binding, the Service Broker is not able to transfer this state to the new binding. The second challenge is, that Platforms have to provide the binding parameters again to the successor binding. But Platforms do not necessarily store these parameter values. Without the values, a user has to provide them again and that prevents an automated rotation of Service Bindings. > > Therefore, this specification defines means to rotate Service Bindings. A Service Broker can declare in the catalog per plan if it supports the creation of a successor binding by setting the binding_rotatable field to true. If the field is set to false or not present, the Platform MUST NOT attempt to rotate a Service Binding of this plan. > > To create a successor binding, the Platform MUST provide a predecessor_binding_id field in the binding provisioning request. The value of this field MUST be the Service Binding ID of a non-expired Service Binding of the same Service Instance. The request creates a new Service Binding with a new binding ID. Both Service Bindings, the new and the old one, MUST both be valid in parallel until they expired or are deleted. >

Is there prior art/ discussion on this topic I can refer to ?

Thanks in advance for your help.

scothis commented 4 months ago

The Service Binding Spec doesn't have any direct mechanism for provisioning/rotating/revoking credentials. This is left to individual provisioned services to define. The goal for bindings is to connect provisioned services to workloads while being agnostic to the content of both.

We can explore creating a spec extension that provides recommendations for services to be able to communicate to workloads when credentials expire. Formalizing the keys you suggest should be a quick process.

With the reference implementation of the spec there are two ways to rotate credentials today:

  1. Update the content of the credentials within the referenced binding secret. The workload will need to notice the change on the filesystem and pickup the new creds. If the workload does not detect the new creds, once the existing creds are invalidated, it should crash and restart to pick up the new creds.
  2. Create a new secret and update the provisioned service's reference to point at the new secret. The ServiceBinding controller will update workloads to use the new secret triggering a Deployment rollout (or equivalent for other resources). Resources like Knative Revisions that freeze the spec will continue to use the old credentials until that revision is retired.
nebhale commented 4 months ago

@gberche-orange Hey there, long time no see! I'll put in my two-cents that option 2 is where we've seen the most success. The idea that an individual secret is immutable and you simply create new, uniquely named secrets that roll out using standard Pod behaviors has a lot of advantages.

gberche-orange commented 4 months ago

Thanks @scothis and @nebhale for your prompt answers (and good time together with CloudFoundry ! )

The Service Binding Spec doesn't have any direct mechanism for provisioning/rotating/revoking credentials. This is left to individual provisioned services to define.

Do you know if other specs had some progress on this area to standardize a common contract to provisionned services ? I recall discussions about opensourcing vmware Services-Toolkit BTW, I just briefely looked at documentation for version 0.9 and could not yet spot a clear UX for secret rotation of provisioned services.

We can explore creating a spec extension that provides recommendations for services to be able to communicate to workloads when credentials expire. Formalizing the keys you suggest should be a quick process.

That would be great, thanks !

With the reference implementation of the spec there are two ways to rotate credentials today:

  1. [...] If the workload does not detect the new creds, once the existing creds are invalidated, it should crash and restart to pick up the new creds

This seems very dependent on app behavior in face of errors from the service rejecting credentials, and creates unavoidable downtime.

  1. Create a new secret and update the provisioned service's reference to point at the new secret. The ServiceBinding controller will update workloads to use the new secret triggering a Deployment rollout (or equivalent for other resources).

This is great, and removes the need for a distinct controller such as https://github.com/stakater/Reloader I could not spot this behavior mentionned in this specs. Did I miss it ? Would it make sense to document it ?

A deployment rollout might take various time, or even fail for unrelated reasons.

How would be provisionned service know how long to keep the old secret active ?

Thanks again for your help!

scothis commented 4 months ago

At its core the Service Binding Core Specification is three things:

  1. a way for services to advertise binding credentials
  2. a way for a workload process to discover services bound to the container
  3. a way to wire the binding credentials advertised by a service into a workload container

More advanced behavior can be built on top of this core, but is not covered by the spec. Not that we don't think those higher level experience are important, we started with a non-contentious core that can serve as a foundation to build upon as there is interest. That can either be done in open source or proprietary layers.

A service knowing when a credential is no longer used is a hard problem to solve since it implies knowledge of how the workload operates. For example, scale to zero workloads add a dynamic where any previous running pod could be restarted.

The core spec doesn't assume the implementation will be a controller activity participating in the cluster, although the reference implementation is such a controller. The logic to perform a binding is in a package independent of the controller lifecycle. A conformant implementation could just as easily be implemented in a CLI or other client that applies bound workloads to the cluster.

Some of the rollout behavior is specific to how the reference implementation works, we can document it there. We should also add docs about the reference implementation to the website.