Document experimental status_changed_uri

tbroyer commented 9 years ago

There's a flaw in the workflow: we might have to change how we delete organizations (i.e. first ask to stop all instances manually, one-by-one, then only when they're all stopped can the organization be deleted – this would allow taking into account the response from providers, this is only possible when done discretely rather than in batch)

That leads me to question whether we should documented the new status_changed_uri field at all, or wait until we fixed the organization-deletion workflow and can provide guarantees as to whether providers will be called back.

On the pro side of documenting the current status, providers can start to adapt and fill-in the field.

silently commented 9 years ago

Regarding the issue: why is this "only possible when done discretely rather than in batch"?

tbroyer commented 9 years ago

Scenario: you delete an organization, so we have to change all its app-instances to status STOPPED. Let's say you have 10 such instances. We call the first 8 status_changed_uri without problem, but there's an error on the 9th. We'd thus (probably) have to rollback the first 8 app instances to status RUNNING, but we could face an error notifying their status_changed_uri; now what should we do?

Or we could leave the organization "available" until all its app instances have been properly stopped, retrying regularly for those who fail. But then what if we retry for one week without success? We'd start purging app instances that had been correctly stopped one week ago.

I let you imagine how that could work (or not) if we introduced intermediary states like "in the process of being deleted", or things like "two-phase commit" (an app instance cannot be purged until all app instances in the "batch" have been correctly stopped).

Well, it's way too complicated for a case (deleting an organization) that should happen that often; so requiring that app instances are stopped manually is probably the best compromise.

silently commented 9 years ago

What error could happen while notifying status_changed_uri? You mean a network error?

tbroyer commented 9 years ago

Network error, or the endpoint returns a non-success status code. Worst case is the status_changed_uri is plain wrong, and we're locked up (same could happen for the destruction_uri)

silently commented 9 years ago

From my point of view (and Bruno's), none of these cases is blocking. This is a notification to inform the provider, not to ask him if he's okay.

In particular non-success is not acceptable as you describe in the doc: "The response from the provider is ignored in this case, as this request is only a notification of some change that has already occurred on Ozwillo's side."

I have added to the provider portal features the ability to see failing request from Ozwillo to the provider, or the Kernel could even send an email to the provider admin, but this can't block the user ability to delete app instances.

tbroyer commented 9 years ago

The problem with the current situation, as described in this PR, is that the notification is nearly useless to the provider if he has no guarantee to be notified. One question is: what use-case does this "notification" fulfills? If we don't have a clear answer, then maybe we shouldn't document it (and possibly rollback the code eventually).

Among reasons for notifying the provider:

stop scheduled tasks (to save resources)
make a public (anonymous) part of the app read-only or unavailable
stop notifying users (by mail or through the Ozwillo notification API) of some event or whatever
similarly, stop sending events on the Ozwillo event bus
stop listening to events on the Ozwillo event bus

For (almost) all of them, we need to give guarantees to the provider that he'll be notified of both running→stopped and stopped→running changes.

silently commented 9 years ago

One more use case: an app_user has bookmarked a service_uri and it may be nice that the provider displays a "This service has been stopped by your administrator, please contact him".

So for network errors, you could possibly store them and resend them later. And for any error (network error failing several times or just once to simplify, bad uri, bad secret), as suggested there could be other alert channels (provider portal, email).

Regarding notifications and events, if they are associated with a STOPPED app instance, we may think of rejecting them.

silently commented 9 years ago

One day of thoughts later, plus the feedback from a provider (Thierry) we've changed our mind regarding what is the best option in the short-term.

Yes to all you said before Thomas, meaning:

app instance deletion depends on provider acknowledgement
organization workflow has to be fixed

Regarding the doc, yes it will help providers starting their work (which is necessary due tu the acknowledgement).

Still, there is something I don't understand: app instances can have their status changed between RUNNING or STOPPED (several times if you've changed your mind). But when they are indeed really deleted (STOPPED for more than one week), the provider won't be notified of it? A DESTROYED state would be nice.

2 more things in the documentation (since it acts like a spec here) are a bit problematic:

"The response from the provider is ignored in this case" --> I think we have now agreed this is not true
"Note that it means the provider might not be notified!" --> this is a problem

ozwillo / ozwillo-doc

Document experimental status_changed_uri #7