Closed gberche-orange closed 4 years ago
Associated symptom
ERROR 20 --- [or-http-epoll-4] s.c.ServiceBrokerWebFluxExceptionHandler : Unknown exception handled:
java.lang.IllegalArgumentException: Unknown service instance ID a005a22e-3684-423a-ad25-c5ad63ce0ca1
at org.springframework.cloud.appbroker.state.InMemoryServiceInstanceStateRepository.lambda$null$3(InMemoryServiceInstanceStateRepository.java:47) ~[spring-cloud-app-broker-core-1.0.4.BUILD-SNAPSHOT.jar!/:1.0.4
org.springframework.cloud.appbroker.state.InMemoryServiceInstanceStateRepository.lambda$null$3(InMemoryServiceInstanceStateRepository.java:47)
Error has been observed by the following operator(s):
|_ Mono.error ⇢ org.springframework.cloud.appbroker.state.InMemoryServiceInstanceStateRepository.lambda$null$3(InMemoryServiceInstanceStateRepository.java:47)
|_ Mono.defer ⇢ org.springframework.cloud.appbroker.state.InMemoryServiceInstanceStateRepository.lambda$getState$4(InMemoryServiceInstanceStateRepository.java:42)
|_ Mono.flatMap ⇢ org.springframework.cloud.appbroker.state.InMemoryServiceInstanceStateRepository.getState(InMemoryServiceInstanceStateRepository.java:42)
|_ Mono.doOnError ⇢ org.springframework.cloud.appbroker.service.WorkflowServiceInstanceService.getLastOperation(WorkflowServiceInstanceService.java:210)
|_ Mono.map ⇢ org.springframework.cloud.appbroker.service.WorkflowServiceInstanceService.getLastOperation(WorkflowServiceInstanceService.java:211)
|_ Flux.then ⇢ org.springframework.cloud.servicebroker.service.ServiceInstanceEventService.getLastOperation(ServiceInstanceEventService.java:69)
|_ Flux.then ⇢ org.springframework.cloud.servicebroker.service.ServiceInstanceEventService.lambda$getLastOperation$2(ServiceInstanceEventService.java:71)
|_ Mono.onErrorResume ⇢ org.springframework.cloud.servicebroker.service.ServiceInstanceEventService.getLastOperation(ServiceInstanceEventService.java:70)
|_ Mono.flatMap ⇢ org.springframework.cloud.servicebroker.service.ServiceInstanceEventService.getLastOperation(ServiceInstanceEventService.java:72)
|_ Mono.doOnRequest ⇢ org.springframework.cloud.servicebroker.controller.ServiceInstanceController.lambda$getServiceInstanceLastOperation$15(ServiceInstanceController.java:169)
|_ Mono.doOnSuccess ⇢ org.springframework.cloud.servicebroker.controller.ServiceInstanceController.lambda$getServiceInstanceLastOperation$15(ServiceInstanceController.java:170)
|_ Mono.flatMap ⇢ org.springframework.cloud.servicebroker.controller.ServiceInstanceController.getServiceInstanceLastOperation(ServiceInstanceController.java:168)
|_ Mono.map ⇢ org.springframework.cloud.servicebroker.controller.ServiceInstanceController.getServiceInstanceLastOperation(ServiceInstanceController.java:173)
|_ Mono.flatMap ⇢ org.springframework.web.reactive.result.method.annotation.ResponseEntityResultHandler.handleResult(ResponseEntityResultHandler.java:130)
To get traces of the associated service instance guid:
# lookup smoke test service instance if it still exists
cf curl /v2/service_instances/a005a22e-3684-423a-ad25-c5ad63ce0ca1
"last_operation": {
"type": "create",
"state": "in progress",
"description": "create service instance started",
"updated_at": "2020-02-13T11:24:39Z",
"created_at": "2020-02-13T11:23:35Z"
},
# when smoke test service instance does not exists, look at audit events
cf curl '/v2/events?q=actee:a005a22e-3684-423a-ad25-c5ad63ce0ca1'
# if an osb-cmdb backend service instance
cf curl "/v3/service_instances?label_selector=backing_service_instance_guid==a005a22e-3684-423a-ad25-c5ad63ce0ca1"
Check the restart history of the broker
cf events osb-cmdb-broker-1
Getting events for app osb-cmdb-broker-1 in org system_domain / space osb-cmdb-broker-1 as me...
time event actor description
2020-02-13T12:41:18.00+0100 audit.app.droplet.create coa-cf
2020-02-13T12:40:58.00+0100 audit.app.update coa-cf state: STARTED
2020-02-13T12:40:58.00+0100 audit.app.build.create coa-cf
2020-02-13T12:40:57.00+0100 audit.app.update coa-cf state: STOPPED
2020-02-13T12:40:47.00+0100 audit.app.upload-bits coa-cf
2020-02-13T12:40:45.00+0100 audit.app.update coa-cf instances: 1, memory: 2048, environment_json: [PRIVATE DATA HIDDEN]
2020-02-13T12:25:09.00+0100 audit.app.droplet.create coa-cf
2020-02-13T12:24:48.00+0100 audit.app.update coa-cf state: STARTED
2020-02-13T12:24:48.00+0100 audit.app.build.create coa-cf
2020-02-13T12:24:48.00+0100 audit.app.update coa-cf state: STOPPED
2020-02-13T12:24:38.00+0100 audit.app.upload-bits coa-cf
2020-02-13T12:24:36.00+0100 audit.app.update coa-cf instances: 1, memory: 2048, environment_json: [PRIVATE DATA HIDDEN]
fixed in v1.0.0: osb-cmdb now relying of osb api operation state to maintain it state, and does not maintain state in the broker RAM anymore.
Expected behavior
As an osb-cmdb operator, in order to operate osb-cmdb onto cloudfoundry without causing downtime, I need osb-cmdb to preserve service during diego cells evacuations
Observed behavior
osb-cmdb is not complying to 12 factors apps:
Following a restart of osb-cmdb, async service provisionning hangs/fails
get last operation
systematically returns an error (presumably 500 status code)Root cause
get last operation
returns the content of theInMemoryServiceInstanceStateRepository
which was clear during last broker restartPossible fixes
ServiceInstanceStateRepository
implementation (e.g. using a mysql database) instead ofInMemoryServiceInstanceStateRepository
get last operation
to lookup the status of the backing service instance(s) (along with possible backing application(s))Workaround
cf curl -X DELETE v2/service_instances/a005a22e-3684-423a-ad25-c5ad63ce0ca1
Affected release
Reproduced on version x.y -->