w3c / websub

WebSub Spec in Social Web Working Group
https://w3c.github.io/websub/
285 stars 50 forks source link

Consistency Issues with WebSub #174

Open kevincox opened 2 years ago

kevincox commented 2 years ago

Right now there are a number of gaps in WebSub that make it possible to miss updates without hacky workarounds. It would be nice if reliable subscriptions could be provided without extra feed fetches.

Consider the following scenarios:

Fetch - Subscribe Race

  1. Subscriber fetches document.
  2. Document Updates
  3. Subscriber subscribes to hub.

In this case the subscriber would never notice that an item was posted.

Workaround - Refetch

One workaround is to simply refetch the feed a short while after the subscription is confirmed. This still relies on some consistency between HTTP caches and the hub but with a sufficent delay should be sufficient.

Solution - If-Match

If the subscriber could include an ETag or Last-Modified header from when they fetched the document the hub could notify the subscriber if there are missed entires.

Solution - Proactive Push

Another option is that once a subscription is confirmed the hub could push the full "current" state of the document form its point of view. Future delta updates are sent relative to the initial full push.

Resubscribe Race

  1. Subscription expires.
  2. Document updates.
  3. Subscriber "re-subscribes"

IIUC there is currently no indication if a subscription request updated an existing subscription or created a new subscription.

Workaround - Resubscribe Early

If resubscribing sufficiently early it is probably safe to assume that clocks are vaguely in sync.

Workaround - Refetch

Much like the initial subscription race you can simply refetch the feed a short while after the subscription is confirmed.

Solution - Proactive Push

If using proactive push above this could be done for new subscriptions, for extended subscriptions the push would not reoccur.

Solution - Resubscription Confirmation

The hub can respond to the subscription request with some sort of indicator confirming that the subscription has been uninterrupted.

dissolve commented 2 years ago

I'm not sure the resubscribe case even makes sense. The time between unsubscribe and resubscribe could be significant. So it's sort of assumed the client should be pulling a full update. It basically reduces to the same as the subscribe.

That said, I'm not sure consistency is ever promised with WebSub. It's sort of not the intent of the spec. One could easily implement with just a full pull of the document on every notification and it would still serve as a significant improvement over continual polling for updates.

kevincox commented 2 years ago

If the resubscribe is expected to fetch the full feed first that is fine. I thought it was desirable that a subscriber could just keep resubscribing without needing to refetch the feed. But I agree it is not a major concern if it should be started with a refetch.

One could easily implement with just a full pull of the document on every notification and it would still serve as a significant improvement over continual polling for updates.

This isn't quite true. Because if a feed updates infrequently (maybe a blog that updates ~monthly) and I set up a long subscription (a couple of weeks) I may miss that post until I poll again at the end of the subscription. The chance of this happening is low but it would be nice if there wasn't a possibility of a very long delay for updates, only bounded by the subscription length. This is my biggest concern, that there is this gap between the fetch and the subscription being active. Especially with CDN caching and similar this is a non-trivial (but usually not huge) gap that will cause very long delays if an update happens to fall into it.

Or is the intent that the subscriber continues to poll as normal and WebSub is just for faster updates between polls, not for reducing required work.

sandhawke commented 2 years ago

As I read the spec, the expectation is that one will re-subscribe well before expiration, to prevent this gap you mention:

This is required so subscribers can renew their subscriptions before the lease seconds period is over without any interruption.

at the end of https://www.w3.org/TR/websub/#subscriber-sends-subscription-request

I don't see (or remember) anything meant to address the "Subscribe Race". Doing a conditional request (using if-none-match) as you suggest seems fairly reasonable, but I agree including an ETag in the subscription request could be nice. I don't recall that being discussed, but it wouldn't surprise me if it was seen as putting complexity in the hub which can be handled by the subscriber. Keeping hubs as simple as possible was seen a priority here. If there were a thriving market for hubs, that might be different.

kevincox commented 2 years ago

As I read the spec, the expectation is that one will re-subscribe well before expiration, to prevent this gap you mention:

This is a good idea but not a strong guarantee. Also if you want to rely on the hub for long periods of time it is nice to have confirmation that the subscription is continued, not forgotten somewhere. I suspect it would be fairly easy for most hubs to include something like hub.resubscribe=true which completely removes this race condition and reassures against forgotten subscriptions for other reasons.

I agree that the current state is ok, but it would definitely be a nice-to-have for a very small implementation cost.

it wouldn't surprise me if it was seen as putting complexity in the hub which can be handled by the subscriber

I don't think this really can be handled by the subscriber. Other than doing one extra poll and hoping for the best there is no good solution here. The core of the problem is that the hub may be sending delta updates, but the subscriber doesn't know what the base version is! There is an assumption that the subscriber's fetch was a superset of the hub's current state but that isn't a great assumption if you ask me.

I agree that it would be good to keep complexity low on the hub but it would be nice to have an optional solution here that can be used to close this gap.

My current thinking would be something like:

  1. An optional subscribe parameter hub.etag which can be used to pass the etag that the subscriber is aware of.
  2. (maybe) An optional subscribe parameter hub.last_modified which can be used to pass the last modified date that the subscriber is aware of.
  3. After confirming the subscription the hub will send an update with the full contents that it is aware of to the subscriber.
    • If etag or last-modified value passed is known this can be a delta-push based on that version. If the delta is empty it can be skipped entirely.

Pros:

Cons: