operationKey-reset machanism for putting the architecture back into operation after an unplanned downtime

To recreate the scenario , took RegistryOffice and TypeApprovalRegister Service : /v1/regard-updated-approval-status Scenario : /v1/document-approval-status triggers /v1/regard-updated-approval-status in RO For the Operation-server "ro-0-0-1-op-s-3003" in RO modified the operation-key to "12345678": For the Operation-client "tar-0-0-1-op-c-3022" in TAR modified the operation-key to "910111213":

Solution 1 : By using the following OAM services, a admin can configure the operation-keys,

/core-model-1-4:control-construct/logical-termination-point={uuid}/layer-protocol=0/operation-client-interface-1-0:operation-client-interface-pac/operation-client-interface-configuration/operation-key
/core-model-1-4:control-construct/logical-termination-point={uuid}/layer-protocol=0/operation-server-interface-1-0:operation-server-interface-pac/operation-server-interface-configuration/operation-key

Solution 2 : Using the /v1/update-operation-key , a admin can update the operation-keys to a same value.

Solution 3 : Operating OKM in protection mode , which makes sure for every 5 mins proper operation-keys will be set to the approved. clients.

Solution 4 : Operating OKM in reactive mode , reapproving this link from ALT /v1/add-operation-client-to-link will trigger /v1/regard-updated-link which triggers /v1/update-operation-key for the client and server resolves the problem.

Solution 5: The solution provided in the issue https://github.com/openBackhaul/OperationKeyManagement/issues/39 fixes this which sets the operation-key to default value for every 5 mins.

Any problem noticed after the above mentioned solution? no

Which solution is better?

Incase , if the problem is very particular to a single service call , solution 4,3,2,1 shall be preferred.
Solution 4 , if operationkeyManagement applications is up and running will also have a self-healing capability.
Solution 5 is preferred incase if solution 4 is not self-healing and more links are affected because of this problem.

But there is one overall feedback. After identifying the problem , we can apply the solution. But , how this problem will be identified ?

After a Type approver approving a application.
We can notice this failure in the EATL application. Followed which we can apply this workaround. But , practically , it is a tedious process.

Proposing the following , Proposal#1: In the GCP , we have a SMTP relay server.In case of receiving any failure records in the EATL application , then we can trigger an email to the admin. Proposal#2: We can have an application for FaultManagement(FM). In case of receiving any failure records in the EATL application , then EATL shall forward it to the "FM" application , which can list the failures as alarm in a easily readable format.

Kindly let me know your views in this regards.

openBackhaul / ApplicationPattern

operationKey-reset machanism for putting the architecture back into operation after an unplanned downtime #303