openBackhaul / ApplicationPattern

Pattern for REST servers
Apache License 2.0
6 stars 15 forks source link

operationKey-reset machanism for putting the architecture back into operation after an unplanned downtime #303

Open PrathibaJee opened 2 years ago

PrathibaJee commented 2 years ago

An analysis is to be elaborated about the question: Would some sort of operationKey-reset be required for putting the architecture back into operation after an unplanned downtime, or would it be just a risk.

Please ,

PrathibaJee commented 2 years ago

To recreate the scenario , took RegistryOffice and TypeApprovalRegister Service : /v1/regard-updated-approval-status Scenario : /v1/document-approval-status triggers /v1/regard-updated-approval-status in RO For the Operation-server "ro-0-0-1-op-s-3003" in RO modified the operation-key to "12345678": For the Operation-client "tar-0-0-1-op-c-3022" in TAR modified the operation-key to "910111213":

Solution 1 : By using the following OAM services, a admin can configure the operation-keys,

Solution 2 : Using the /v1/update-operation-key , a admin can update the operation-keys to a same value.

Solution 3 : Operating OKM in protection mode , which makes sure for every 5 mins proper operation-keys will be set to the approved. clients.

Solution 4 : Operating OKM in reactive mode , reapproving this link from ALT /v1/add-operation-client-to-link will trigger /v1/regard-updated-link which triggers /v1/update-operation-key for the client and server resolves the problem.

Solution 5: The solution provided in the issue https://github.com/openBackhaul/OperationKeyManagement/issues/39 fixes this which sets the operation-key to default value for every 5 mins.

Any problem noticed after the above mentioned solution? no

Which solution is better?

But there is one overall feedback. After identifying the problem , we can apply the solution. But , how this problem will be identified ?

  1. After a Type approver approving a application.
  2. We can notice this failure in the EATL application. Followed which we can apply this workaround. But , practically , it is a tedious process.

Proposing the following , Proposal#1: In the GCP , we have a SMTP relay server.In case of receiving any failure records in the EATL application , then we can trigger an email to the admin. Proposal#2: We can have an application for FaultManagement(FM). In case of receiving any failure records in the EATL application , then EATL shall forward it to the "FM" application , which can list the failures as alarm in a easily readable format.

Kindly let me know your views in this regards.