orange-cloudfoundry / osb-cmdb

A configuration management db for Open Service Broker API broker implementations
Apache License 2.0
14 stars 1 forks source link

get-last-operation indefinidely returns error if broker is restarted during service provisionning #7

Closed gberche-orange closed 4 years ago

gberche-orange commented 4 years ago

Expected behavior

As an osb-cmdb operator, in order to operate osb-cmdb onto cloudfoundry without causing downtime, I need osb-cmdb to preserve service during diego cells evacuations

Observed behavior

osb-cmdb is not complying to 12 factors apps:

Following a restart of osb-cmdb, async service provisionning hangs/fails

Root cause

Possible fixes

Workaround

Affected release

Reproduced on version x.y -->

gberche-orange commented 4 years ago

Associated symptom

ERROR 20 --- [or-http-epoll-4] s.c.ServiceBrokerWebFluxExceptionHandler : Unknown exception handled:
 java.lang.IllegalArgumentException: Unknown service instance ID a005a22e-3684-423a-ad25-c5ad63ce0ca1
    at org.springframework.cloud.appbroker.state.InMemoryServiceInstanceStateRepository.lambda$null$3(InMemoryServiceInstanceStateRepository.java:47) ~[spring-cloud-app-broker-core-1.0.4.BUILD-SNAPSHOT.jar!/:1.0.4

    org.springframework.cloud.appbroker.state.InMemoryServiceInstanceStateRepository.lambda$null$3(InMemoryServiceInstanceStateRepository.java:47)
 Error has been observed by the following operator(s):
    |_  Mono.error ⇢ org.springframework.cloud.appbroker.state.InMemoryServiceInstanceStateRepository.lambda$null$3(InMemoryServiceInstanceStateRepository.java:47)
    |_  Mono.defer ⇢ org.springframework.cloud.appbroker.state.InMemoryServiceInstanceStateRepository.lambda$getState$4(InMemoryServiceInstanceStateRepository.java:42)
    |_  Mono.flatMap ⇢ org.springframework.cloud.appbroker.state.InMemoryServiceInstanceStateRepository.getState(InMemoryServiceInstanceStateRepository.java:42)
    |_  Mono.doOnError ⇢ org.springframework.cloud.appbroker.service.WorkflowServiceInstanceService.getLastOperation(WorkflowServiceInstanceService.java:210)
    |_  Mono.map ⇢ org.springframework.cloud.appbroker.service.WorkflowServiceInstanceService.getLastOperation(WorkflowServiceInstanceService.java:211)
    |_  Flux.then ⇢ org.springframework.cloud.servicebroker.service.ServiceInstanceEventService.getLastOperation(ServiceInstanceEventService.java:69)
    |_  Flux.then ⇢ org.springframework.cloud.servicebroker.service.ServiceInstanceEventService.lambda$getLastOperation$2(ServiceInstanceEventService.java:71)
    |_  Mono.onErrorResume ⇢ org.springframework.cloud.servicebroker.service.ServiceInstanceEventService.getLastOperation(ServiceInstanceEventService.java:70)
    |_  Mono.flatMap ⇢ org.springframework.cloud.servicebroker.service.ServiceInstanceEventService.getLastOperation(ServiceInstanceEventService.java:72)
    |_  Mono.doOnRequest ⇢ org.springframework.cloud.servicebroker.controller.ServiceInstanceController.lambda$getServiceInstanceLastOperation$15(ServiceInstanceController.java:169)
    |_  Mono.doOnSuccess ⇢ org.springframework.cloud.servicebroker.controller.ServiceInstanceController.lambda$getServiceInstanceLastOperation$15(ServiceInstanceController.java:170)
    |_  Mono.flatMap ⇢ org.springframework.cloud.servicebroker.controller.ServiceInstanceController.getServiceInstanceLastOperation(ServiceInstanceController.java:168)
    |_  Mono.map ⇢ org.springframework.cloud.servicebroker.controller.ServiceInstanceController.getServiceInstanceLastOperation(ServiceInstanceController.java:173)
    |_  Mono.flatMap ⇢ org.springframework.web.reactive.result.method.annotation.ResponseEntityResultHandler.handleResult(ResponseEntityResultHandler.java:130)

To get traces of the associated service instance guid:

# lookup smoke test service instance if it still exists 
cf curl /v2/service_instances/a005a22e-3684-423a-ad25-c5ad63ce0ca1

      "last_operation": {
         "type": "create",
         "state": "in progress",
         "description": "create service instance started",
         "updated_at": "2020-02-13T11:24:39Z",
         "created_at": "2020-02-13T11:23:35Z"
      },

# when smoke test service instance does not exists, look at audit events
cf curl '/v2/events?q=actee:a005a22e-3684-423a-ad25-c5ad63ce0ca1'

# if an osb-cmdb backend service instance
cf curl "/v3/service_instances?label_selector=backing_service_instance_guid==a005a22e-3684-423a-ad25-c5ad63ce0ca1"

Check the restart history of the broker

cf events osb-cmdb-broker-1
Getting events for app osb-cmdb-broker-1 in org system_domain / space osb-cmdb-broker-1 as me...

time                          event                      actor    description
2020-02-13T12:41:18.00+0100   audit.app.droplet.create   coa-cf
2020-02-13T12:40:58.00+0100   audit.app.update           coa-cf   state: STARTED
2020-02-13T12:40:58.00+0100   audit.app.build.create     coa-cf
2020-02-13T12:40:57.00+0100   audit.app.update           coa-cf   state: STOPPED
2020-02-13T12:40:47.00+0100   audit.app.upload-bits      coa-cf
2020-02-13T12:40:45.00+0100   audit.app.update           coa-cf   instances: 1, memory: 2048, environment_json: [PRIVATE DATA HIDDEN]
2020-02-13T12:25:09.00+0100   audit.app.droplet.create   coa-cf
2020-02-13T12:24:48.00+0100   audit.app.update           coa-cf   state: STARTED
2020-02-13T12:24:48.00+0100   audit.app.build.create     coa-cf
2020-02-13T12:24:48.00+0100   audit.app.update           coa-cf   state: STOPPED
2020-02-13T12:24:38.00+0100   audit.app.upload-bits      coa-cf
2020-02-13T12:24:36.00+0100   audit.app.update           coa-cf   instances: 1, memory: 2048, environment_json: [PRIVATE DATA HIDDEN]
gberche-orange commented 4 years ago

fixed in v1.0.0: osb-cmdb now relying of osb api operation state to maintain it state, and does not maintain state in the broker RAM anymore.