orange-cloudfoundry / osb-cmdb

A configuration management db for Open Service Broker API broker implementations
Apache License 2.0
14 stars 1 forks source link

Kubernetes Service Catalog interop : idempotent on duplicate create service #17

Closed poblin-orange closed 4 years ago

poblin-orange commented 4 years ago

Expected behavior

Expecting Openshift service catalog to be able to consume services fronted by OSBCMDB.

Observed behavior

Openshift issues multiple PUT to create a service instance. Seems compliant with OSB spec.

.https://github.com/kubernetes-sigs/service-catalog/issues/1639

The broker rejects the duplicate PUT, giving provisionning error on OpenShift v3.9.51 ( Kubernetes v1.9.1+a0ce1bc657 )

cc @pdechamboux

gberche-orange commented 4 years ago

Symptom when concurrent provisionning request is sent by K8S

   2020-02-20T17:28:00.44+0100 [APP/PROC/WEB/0] OUT org.cloudfoundry.client.v2.ClientV2Exception: CF-ServiceInstanceNameTaken(60002): The service instance name is taken: 95a4a8e5-52df-4810-82f0-d04366efbd5c
   2020-02-20T17:28:00.44+0100 [APP/PROC/WEB/0] OUT     at org.cloudfoundry.reactor.util.ErrorPayloadMapper.lambda$null$0(ErrorPayloadMapper.java:46) ~[cloudfoundry-client-reactor-3.16.0.RELEASE.jar!/:na]
   2020-02-20T17:28:00.44+0100 [APP/PROC/WEB/0] OUT     at org.cloudfoundry.reactor.util.ErrorPayloadMapper.lambda$null$9(ErrorPayloadMapper.java:98) ~[cloudfoundry-client-reactor-3.16.0.RELEASE.jar!/:na]
   2020-02-20T17:28:00.44+0100 [APP/PROC/WEB/0] OUT     at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:118) ~[reactor-core-3.2.12.RELEASE.jar!/:3.2.12.RELEASE]
   2020-02-20T17:28:00.44+0100 [APP/PROC/WEB/0] OUT     at reactor.core.publisher.FluxOnAssembly$OnAssemblySubscriber.onNext(FluxOnAssembly.java:345) ~[reactor-core-3.2.12.RELEASE.jar!/:3.2.12.RELEASE]
   2020-02-20T17:28:00.44+0100 [APP/PROC/WEB/0] OUT     at reactor.core.publisher.FluxSwitchIfEmpty$SwitchIfEmptySubscriber.onNext(FluxSwitchIfEmpty.java:67) ~[reactor-core-3.2.12.RELEASE.jar!/:3.2.12.RELEA
gberche-orange commented 4 years ago

Desired approach:

This is costly to do because the StateRepository is not currently accessible to CloudFoundryDeployer but only the WorkflowServiceInstanceService

Alternative more affordable degraded approach

=> we sacrifice the cases where

gberche-orange commented 4 years ago

Pb with initial implementation in https://github.com/orange-cloudfoundry/osb-cmdb-spike/pull/27/commits/ec163e71aa20cc0eaaad570c87cbb1392d93b909: race condition between servlet threads (reading the state repository) and reactor threads (writing to state repository async)

Alternatives:

Option 1) was successfully tested on openshift

gberche-orange commented 4 years ago

Same problem arises with service bindings:

org.cloudfoundry.client.v2.ClientV2Exception: CF-ServiceKeyNameTaken(360001): The service key name is taken: a6c72b10-9749-4395-b257-bfab24c49d53

Log pollution of level ERROR:

19:02:33.230: [APP/PROC/WEB.0] 2020-02-26 18:02:33.229 ERROR 27 --- [ry-client-nio-3] o.s.c.appbroker.deployer.DeployerClient  : Error creating backing service key a6c72b10-9749-4395-b257-bfab24c49d53 for service 6a9aac92-a6e4-4141-a1c9-15dc0cf577bb with error 'CF-ServiceKeyNameTaken(360001): The service key name is taken: a6c72b10-9749-4395-b257-bfab24c49d53'
[...]
19:02:33.238: [APP/PROC/WEB.0] 2020-02-26 18:02:33.235 ERROR 27 --- [ry-client-nio-3] d.DefaultBackingServicesProvisionService : Error creating backing services keys [BackingServiceKey{serviceInstanceName='6a9aac92-a6e4-4141-a1c9-15dc0cf577bb', name='a6c72b10-9749-4395-b257-bfab24c49d53', plan='null', parameters={}, properties={keepTargetSpaceOnDelete=true, target=p-mysql}, parametersTransformers=[], rebindOnUpdate=false}] with error 'CF-ServiceKeyNameTaken(360001): The service key name is taken: a6c72b10-9749-4395-b257-bfab24c49d53'

19:02:33.238: [APP/PROC/WEB.0] 2020-02-26 18:02:33.237 ERROR 27 --- [ry-client-nio-3] ppDeploymentCreateServiceBindingWorkflow : Error creating backing services keysfor p-mysql/10mb with error 'CF-ServiceKeyNameTaken(360001): The service key name is taken: a6c72b10-9749-4395-b257-bfab24c49d53'

19:02:33.245: [APP/PROC/WEB.0] 2020-02-26 18:02:33.244 ERROR 27 --- [ry-client-nio-3] s.c.ServiceBrokerWebFluxExceptionHandler : Unknown exception handled: 

The service binding specs mentions at https://github.com/openservicebrokerapi/servicebroker/blob/master/spec.md#binding in the response-6 section:

Status Code Description
200 OK SHOULD be returned if the Service Binding already exists and the requested parameters are identical to the existing Service Binding. The expected response body is below.
201 Created MUST be returned if the Service Binding was created as a result of this request. The expected response body is below.
202 Accepted MUST be returned if the binding is in progress. The operation string MUST match that returned for the original request. This triggers the Platform to poll the Polling Last Operation for Service Bindings endpoint for operation status. Information regarding the Service Binding (i.e. credentials) MUST NOT be returned in this response. Note that a re-sent PUT request MUST return a 202 Accepted, not a 200 OK, if the Service Binding is not yet fully created.

However, the osb-cmdb service binding are not yet asynchronous (nor is there a scab support for async service bindings), hence returning a 202 for dup binding requests isn't an option.

Possible alternatives:

  1. Keep returning a 500 error and try to reduce log pollution
    1. Avoid creating duplicate service keys using the state repository to detect inflight requests
      1. throw a ServiceBrokerException with a helpful diagnostic message
    2. Just add error handling for org.cloudfoundry.client.v2.ClientV2Exception: CF-ServiceKeyNameTaken(360001)
  2. Keep returning a 500 error and ignore log pollution

Opted to 2) Keep returning a 500 error and ignore log pollution

gberche-orange commented 4 years ago

See related openshift 3.9 documentation at https://docs.openshift.com/container-platform/3.9/architecture/service_catalog/index.html

Pointers and procedure to refresh service catalog in Openshift:

Procedure to register a service broker with basic auth (see inspiration)

# enable oc completion
$ source <(oc completion bash)

$ kubectl create secret generic osb-cmdb-0-auth \
--from-literal username=redacted-user \
--from-literal password=redacted-password

# Alternatively in a spec file as documented into https://kubernetes.io/docs/concepts/configuration/secret/#creating-a-secret-manually

$ cat secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: osb-cmdb-0-auth
type: Opaque
data:
  username: *****=
  password: ********
$ kubectl apply -f ./secret.yaml
secret "osb-cmdb-0-auth" created

# Seems to fail to make the broker appear in openshift UI
$ svcat register osb-cmdb-0-broker \
--url https://ocb-cmdb-0.redacted-domain \
--scope cluster \
--basic-secret osb-cmdb-0-auth
--skip-tls

# Alternatively
$ cat broker.yml
apiVersion: servicecatalog.k8s.io/v1beta1
kind: ClusterServiceBroker
metadata:
  name: osbcmdb-broker
spec:
  url: https://osb-cmdb-broker-0.redacted-domain
  insecureSkipTLSVerify: true
  authInfo:
    basic:
      secretRef:
        namespace: interco-kermit-fp
        name: osb-cmdb-0-auth
$ oc create -f broker.yml
clusterservicebroker "osbcmdb-broker" created

Other diagnostic commands

$ svcat get brokers 
$ svcat describe broker osb-cmdb-0-broker
[...]
Status:   Ready - Successfully fetched catalog entries from broker @ 2020-04-09 15:35:47 +0000 UTC   

$ svcat get classes

# Errors
$ svcat get class p-mysql-osb --scope all
Error: class 'p-mysql-osb' not found in cluster scope

# works, filter by date
$ oc get ClusterServiceClass
[...]
e1c17bcf-011d-4a34-9ee3-357a28cd7c77   15h
f8fdfaa9-fdc1-4b68-8791-e55a1419a360   15h

# corresponds to p-mysql-osb"
$ oc get ClusterServiceClass --output json f8fdfaa9-fdc1-4b68-8791-e55a1419a360
[...]
"externalName": "p-mysql-osb",

svcat get plans
svcat get plans -o json

# troubleshoot invalid auth in secrets
$ oc get Secret osbcmdb-broker-5-auth-secret --namespace=interco-kermit-fp-demo --output=yaml | grep password
  password: redacted-base64-password
$ echo -n "redacted-base64-password" | base64 --decode | hexdump 

Cleanup/recovery/purge commands: See related https://github.com/kubernetes-sigs/service-catalog/issues/2268

$ svcat get instances 
      NAME                  NAMESPACE               CLASS    PLAN                   STATUS                   
+---------------+--------------------------------+---------+------+-----------------------------------------+
  p-mysql-ffkv6   cloudfoundry-service-instances   p-mysql   10mb   DeprovisionBlockedByExistingCredentials  
  p-mysql-wbv8r   cloudfoundry-service-instances   p-mysql   10mb   DeprovisionBlockedByExistingCredentials  

$ svcat deprovision p-mysql-ffkv6 --abandon 
This action is not reversible and may cause you to be charged for the broker resources that are abandoned. If you have any bindings for this instance, please delete them manually with svcat unbind --abandon --name bindingName
Are you sure? [y|n]: 
y
Error: deprovision request failed (serviceinstances.servicecatalog.k8s.io "p-mysql-ffkv6" not found)

$ svcat get instances 
      NAME                  NAMESPACE               CLASS    PLAN                   STATUS                   
+---------------+--------------------------------+---------+------+-----------------------------------------+
  p-mysql-wbv8r   cloudfoundry-service-instances   p-mysql   10mb   DeprovisionBlockedByExistingCredentials
gberche-orange commented 4 years ago

Started an upstream PR in https://github.com/spring-cloud/spring-cloud-app-broker/pull/343 but there is remaining effort needed to diagnose test failure. Pausing for now as I'm out of budget for this upstream PR.

image