w3c / push-api

Push API
https://w3c.github.io/push-api/
Other
145 stars 40 forks source link

Retry push event #300

Open collimarco opened 5 years ago

collimarco commented 5 years ago

In the working draft I cannot find anything about retries for the push event.

When our service worker receives a push event it tries to fetch the notifications from the application server. What if a network error occurs? (e.g. the user is on an unreliable network or the server is temporary down)

I think that it would be useful to always retry the push event with exponential backoff when an exception occurs: what do you think? Currently I don't see any way for the developer to create a reliable push handler.

martinthomson commented 5 years ago

There are two network errors you are talking about here:

  1. The push service is inaccessible. In that case, you don't get a push event. The browser keeps trying to get to the push service. This is entirely normal; we expect the browser to be occasionally offline, it's a core use case even.

  2. The application server is inaccessible. In that case, the push event is delivered, but the service worker has trouble executing. That's up to the service worker to manage. It might use the application server to assist it in that (the application server might send another push message if it doesn't get contacted in a reasonable amount of time), but that's on your application - the push API has done its part of the work.

collimarco commented 5 years ago

I am talking about point 2. The application server can be temporary inaccessible due to maintenance or due to internet connection problems that are not caused by the application server itself - e.g. the user device looses the internet connection after receiving the push signal but before or during the fetch.

the application server might send another push message if it doesn't get contacted in a reasonable amount of time

In that case the application server has received a 200 status code from the browser push service and must be contacted by the UA. The application server doesn't know if the push signal is lost or the device is simply turned off for example. So what is a reasonable amount of time? I think that your solution would greatly increase the workload / complexity for the application server; the application server would also flood the browser push service with useless / redundant push signals; the application servers that don't make retries (*) will sometimes loose notifications

(*) Actually it makes sense to not perform retries: you have received a 200 response. I am pretty sure that most applications will implement web push without the (complex) kind of retries that you are describing, thus getting an unreliable service.

IMHO The Push API / service workers should provide a way to process the event in a reliable way.

collimarco commented 5 years ago

Two other considerations:

  1. Section 11.2.2 point 7 of the spec seems to mention a sort of retry mechanism:

    If all the promises resolve successfully, acknowledge the receipt of the push message according to [RFC8030] and abort these steps. If the same push message has been delivered to a service worker registration multiple times unsuccessfully [...] [...] allowing at least three attempts is recommended

Does it mean that if the fetch Promise fails, then the push signal is retried again? That would be exactly what I am asking.

  1. @martinthomson You say that the application server should make retries when a message is not delivered. Suppose that every 30 minutes, for each endpoint, the application server checks if it has pending deliveries and in that case it sends a new push signal to that endpoint. Apart from the huge amount of useless requests that the push service would get, I think that each signal would create a separate push event: the first successful event processed will display all the pending notifications fetched from server, while all the other additional events will not display notifications... that would create problems with userVisibleOnly.
beverloo commented 5 years ago

To start with the obvious: critical data associated with a push message, like the notification that should be shown, should be included as its payload. The ~4KB limit prescribed by RFC8030 should hopefully be sufficient for that. To my knowledge, all current implementations will gracefully handle subsequent fetch failures and timeouts caused by resources included in showNotification.

The main two use-cases for additional fetches that I'm aware of are:

  1. Prefetching content that should be available offline when the notifications is activated. I think Background Sync would be a better venue for that, because it already provides the necessary reliability.
  2. Analytics pings. While important for developers, a retry mechanism could end up being very expensive for users—particularly those on low-end devices.

Are you thinking about one of those cases, or another case?

Putting my Chrome hat on and looking at our metrics, just over 1% of push events result in waitUntil() being rejected. While that may seem low, it raises a compatibility concern if any notifications were shown already, as they would be shown again.

collimarco commented 5 years ago

Are you thinking about one of those cases, or another case?

I am talking about another case: we receive a push event without payload and then we download the pending notifications from the application server. This was the only method available in 2015 when we started developing Pushpad.xyz and I think that many other services do the same: now in our database we have millions of subscriptions that have an endpoint but not the keys to encrypt a payload (because we don't use that). This method is the original method suggested by the Push API, it is compatible with all the current browsers and works well: so I hope you won't break this in the future. Please also see / accept this related request: https://github.com/w3c/push-api/issues/289 Finally I would like to mention that this method doesn't increase the bandwidth usage since we would have to send a request to the server in any case to track the delivery.

all current implementations will gracefully handle subsequent fetch failures and timeouts caused by resources included in showNotification

For the above reasons I think that it would be useful to apply that retry mechanism not only to the showNotification resources (like icon), but to the entire push event, so that a fetch that raises an exception will cause the entire push event to be retried (e.g. with exponential backoff).

If you don't want to retry the push event by default (which imho is the best option) you could at least provide an event.retryUntil that allows you to run some code reliably with retries.

they would be shown again

I think that it is much better to show the notification twice than loosing a notification. Also I think that it is a common practice to retry failed jobs (e.g. server side you always do that). Finally you can use the Notification tag to avoid duplicates: so I don't see any drawback.

martinthomson commented 5 years ago

I'm still having a problem with the premise. You want multiple activations from the same push message. Of course that conflicts with userVisibleOnly. The reason that flag exists is to limit the number of activations without visible feedback (which can be used for bad things).

I think that @beverloo was on the right course with the mention of background sync. That exists specifically to address this class of issue. You don't want multiple push message deliveries for the same message, you want a way to reactivate the service worker. Then, if the application server is down, you register with background sync to be awoken again after some time.

collimarco commented 5 years ago

You want multiple activations from the same push message.

Yes, I mean you should retry if an exception is triggered inside the push event. If I understand the spec correctly, you can extend the existing retry mechanism (Section 11.2.2 point 7) to this use case: when an exception occurs inside the push event, you can simply avoid to acknowledge the reception of the push message to the push service, so that the delivery will be retried after some time.

@beverloo was on the right course with the mention of background sync

That may conflict with userVisibleOnly... I don't know if showing a notification in the sync event (and not inside the push event) will be considered a notification displayed by the browser. Also, I don't know if the browser can display notifications to the user from a sync event...

And in any case the background sync would only prevent network problems, while a retry on exceptions would also make push resistant to application server errors (e.g. you can raise an exception if you receive HTTP 503 in order to force a retry). Without a retry mechanism, you will always loose the processing of a percentage of push events due to temporary exceptions, thus making this technology unreliable.

if the application server is down, you register with background sync to be awoken again after some time.

If I understand the spec correctly background sync will only allow to delay the execution until you get an internet connection. It is not meant as a way to execute retries for arbitrary exceptions... so you cannot use it to make your code resistant to application server errors (e.g. 503).

beverloo commented 5 years ago

[W]e receive a push event without payload and then we download the pending notifications from the application server. This was the only method available in 2015 when we started developing Pushpad.xyz and I think that many other services do the same: now in our database we have millions of subscriptions that have an endpoint but not the keys to encrypt a payload (because we don't use that).

Please consider thinking about an upgrade path. Including the payload with the push message itself is a much better and more reliable experience for users. One idea to quickly source keying material could be to upload the keys with your analytics ping when receiving a message, or when the user next visits the website.

This method is the original method suggested by the Push API, it is compatible with all the current browsers and works well: so I hope you won't break this in the future. Finally I would like to mention that this method doesn't increase the bandwidth usage since we would have to send a request to the server in any case to track the delivery.

There certainly are use-cases for push messages that don't carry a payload. They come with two downsides: lack of sender authentication (by not being able to validate the keying material) and reduced reliability if more data has to be fetched from the network.

Introducing additional fetches definitely increases bandwidth as a full connection has to be established; RFC8030 includes a message receipt mechanism for acknowledgements, and many of the other push services provide similar functionality. Also something to consider is that push services are really good at delivering messages, even in networking conditions where other fetches might not go through.

For the above reasons I think that it would be useful to apply that retry mechanism not only to the showNotification resources (like icon)

Note that at least two implementations handle this by omitting the resources that can't be loaded rather than trying to fetch them again.

Finally you can use the Notification tag to avoid duplicates: so I don't see any drawback.

Unfortunately the availability of such functionality doesn't mean that it's Web compatible - if existing users of the Push API don't include a tag already, users are still going to be exposed to the negative effects.

when an exception occurs inside the push event, you can simply avoid to acknowledge the reception of the push message to the push service, so that the delivery will be retried after some time.

The push messaging functionality might not be provided by the browser itself; several implementations will be limited by operating system constraints here.

That may conflict with userVisibleOnly... I don't know if showing a notification in the sync event (and not inside the push event) will be considered a notification displayed by the browser.

Yeah, that is unlikely to work. The purpose of userVisibleOnly is to ensure that users are aware of why their device is doing work in the background.

collimarco commented 5 years ago

Please consider thinking about an upgrade path

In theory I can agree, but in practice we cannot change a core feature like this. Any small problem may have disrupting consequences on our business. It is very likely to encounter problems: for example on many browsers versions it is difficult to update the service worker code imported with importScripts. In any case I don't think that my specific case is relevant: I think that this discussion must be valid in general.

In general it is valid to receive a push signal without payload and in general many developers may want to process that event reliably - i.e. have a way to retry the event if a temporary exception occurs.

Currently Javascript Events, Service Workers and the Push API don't offer a way to process an event / some code reliably. The only similar solution is the Background Sync, however it is not an actual solution for the following reasons:

I think that the event should offer a method like event.retryUntil(<Promise>) which ensures that multiple retries will be performed if the promise fails. A method like this would be useful for many events of the Push API (e.g. push, pushsubscriptionchange). I think that this method can be actually implemented in browsers:

  1. The browser saves the event type (e.g. push) and the event object locally in a file; no additional data or code must be saved
  2. The browser triggers the event again when it can do that - it can be after a few milliseconds or, if the browser process gets killed by the OS in the meantime, it can be many hours later when the user opens the browser again
  3. On success, or after N failed attempts the browser aborts this steps

The above solution would be also "Web compatible" / backward compatible. Also I don't see any system constraints for a solution like this.

collimarco commented 5 years ago

After further investigation, I have noticed that Background Sync actually supports a retry mechanism: this blog post says that Background Sync can perform retries even when custom exception are raised. Background Sync also waits for the connection to become available before attempting a retry.

I also believe that you can call ServiceWorkerRegistration.showNotification() inside a Background Sync. Am I wrong?

Then the easiest approach to make the processing of the push event reliable is this:

  1. Call swRegistration.sync.register('downloadAndDisplayNotifications');
  2. Listen for the sync event and when you can successfully download data from the application server you display the notifications using ServiceWorkerRegistration.showNotification()

You already recommend Background Sync for pushsubscriptionchange: it would be perfect if we could use Background Sync also for the push event (you can also suggest that in the spec).

The only problem I can see in this approach is that userVisibleOnly will not detect that the push event has displayed the notifications. This problem is specific to the Push API: can we find a solution for this?

collimarco commented 5 years ago

Some more thinking...

I also believe that you can call ServiceWorkerRegistration.showNotification() inside a Background Sync. Am I wrong?

Yes, you can use that. I have found that some blog posts (example) actually use it inside sync. Also it would make sense because showNotification() is just a function provided by service workers and it must always work, inside any event.

There certainly are use-cases for push messages that don't carry a payload

Thinking about additional cases where you may want to use a fetch before displaying the notifications are the following:

Also talking about bandwidth usage of an additional fetch when push is received, I think it has a very negligible effect compared to the technology as a whole. Note that a notification may cause multiple connections: persistent connection to push service, icon, image, 2x action button icons, and maybe others (e.g. prefetch application content).

The only problem I can see in this approach is that userVisibleOnly will not detect that the push event has displayed the notifications. This problem is specific to the Push API: can we find a solution for this?

So Background Sync seems the perfect solution... the only worry is about userVisibleOnly. The actual problem is that you don't define an algorithm for userVisibleOnly. The only general definition is this:

The userVisibleOnly option, when set to true, indicates that the push subscription will only be used for push messages whose effect is made visible to the user, for example by displaying a Web Notification.

I think that you should define that more clearly (step by step) and allow notifications to be displayed indirectly by the push even (e.g. inside a sync). A simple algorithm that would solve this is to keep the number of push received in a day and the number of notifications displayed in a day. Then userVisibleOnly can be considered satisfied if: number of notifications > 0.90 * number of push received.

beverloo commented 5 years ago

In theory I can agree, but in practice we cannot change a core feature like this. Any small problem may have disrupting consequences on our business.

There's a plethora of techniques that can be used for safely rolling out changes - slow roll-out, (targeted) A/B testing, and so on. We can't really help you with that problem.

I think that the event should offer a method like event.retryUntil() which ensures that multiple retries will be performed if the promise fails.

You're right that an explicit opt-in approach wouldn't break existing users, but there still are other concerns.

By sending a push message, developers have the ability to run JavaScript code on the user's device at a time of their choosing. This causes significant resource usage and is privacy sensitive, which is why various implementations require a notification to be shown.

Now suppose that my Service Worker, unbeknownst to the user, runs the following code:

self.addEventListener('push', event => {
    // (1) Mine some cryptocurrency until the UA-defined time limit is reached.
    event.retryUntil(new Promise((resolve, reject) => {
        setTimeout(reject, GuessPushEventTimeLimit());

        // Iteratively do stuff here.
    }));

    // (2) Would timeouts be considered as failures?
    event.retryUntil(new Promise);
});

Suddenly the developer gets N execution opportunities for each notification that's been shown, which is a multiplication factor for these concerns.

Variations on (1) could be to either retry forever until some UA-defined limit is reached, or execute this a few times before showing a notification. At that point we're effectively redesigning Background Sync, which is out of scope.

I also believe that you can call ServiceWorkerRegistration.showNotification() inside a Background Sync. Am I wrong?

This is correct. Once permission has been granted, notifications can be shown from anywhere.

stronger authentication with the application server (using normal sessions for example); the notifications can be displayed only if the user is actually logged in

That's a fair concern - take a look at the cookiechange event that's being proposed as part of the Cookie Store API, which enables you to observe deleted (and thus expired) cookies.

https://wicg.github.io/cookie-store/

I think that you should define that more clearly (step by step) and allow notifications to be displayed indirectly by the push even (e.g. inside a sync) ... Then userVisibleOnly can be considered satisfied if: number of notifications > 0.90 * number of push received.

Because of the asynchronous and ephemeral nature of Service Workers, this actually is a really hard problem. It depends on the conditions of the device, requirements of the operating system and user agent, and the user experience decisions made by the browser vendor.

To that end the specification doesn't detail how userVisibleOnly works or what the conditions are - those are implementation specific.

collimarco commented 5 years ago

To that end the specification doesn't detail how userVisibleOnly works or what the conditions are - those are implementation specific.

I think that you should define that to avoid future problems...

Example: We decide to use Background Sync to fetch and display the notifications (the sync is registered from the push event). It works. It also respects privacy and complies with the spec, by displaying some notifications. Then arrives a new browser vendor (e.g. Apple) or simply a new browser version which has a different interpretation of your spec and breaks our system. Is that a browser bug? Or is that a bug of our system? Who should fix that?

Because of the asynchronous and ephemeral nature of Service Workers, this actually is a really hard problem

I think that current browser implementation of userVisibleOnly are already using similar techniques... If you try to trigger some push without displaying notifications browsers will not complain. Even Chrome, as you may know, has greatly relaxed the restrictions about userVisibleOnly.

In general I don't think that it is a big problem to attach two counters to a service worker registration in order to detect the number of push events and number of notification displayed. Even one counter is enough if you prefer... you can use +1 and -1 on the same counter and you make sure that it stays in a range around zero (e.g. [0, 10]). If a websites goes out of that range then you display a default notification to the user that informs him that the website is running in background...

What are the pros? You can process the push event the way you want... (e.g. using reliable technologies like Background sync or other future technologies)

What are the cons? I don't see any problem. Privacy is preserved because a website cannot run in background without displaying notifications.

IMHO if you don't allow to do that in the spec, the only effect will be to have worst code with the exact same effects (also for privacy!).

If you follow my suggestion, then developers can use this confidently:

self.addEventListener('push', function(event) { 
  sync.register('downloadAndDisplayNotifications');
});

Otherwise they will write this (same effect as above... bad code):

self.addEventListener('push', function(event) { 
  try {
    downloadAndDisplayNotifications()
  } catch(error) { // catch network error
    sync.register('downloadAndDisplayNotifications');
  }
});
collimarco commented 5 years ago

Another reason for downloading the payload from the application server is that you get an additional layer of security (as I described here some years ago). If you send payloads through the push service, an attacker that manages to steal secret data from the application database (e.g. endpoints and keys), can send fraudulent notifications to end users. Instead, if you download notification content from the application server, the hackers won't be able to show any notification... they can only send useless push signals.

collimarco commented 5 years ago

I am testing the use of Web Background Sync... and I get this error on Chrome:

Uncaught (in promise) DOMException: Attempted to register a sync event without a window or registration tag too long.

When the push event is triggered no window is opened and the browser blocks sync.register. I assume that the same issue affects the pushsubscriptionchange event. So basically you suggest to use Web Background Sync with the Push API but we cannot use it.

Is there anyone that can be contacted in order to change this behavior and allow sync.register even when no window is opened?