w3c / websub

WebSub Spec in Social Web Working Group
https://w3c.github.io/websub/
287 stars 51 forks source link

Topics #110

Closed julien51 closed 7 years ago

julien51 commented 7 years ago

I've been scratching my head for a long time and not able to find a great solution but I think we miss a good mechanism to let someone subscribe to a "virtual" topic, a topic that does not exist but represent a set of documents.

Basically, right now, the most common use case is of someone subscribing to a given document. Traditionally (PuSH days), this has been an XML feed, but now, it can be anything available at a given URL. When you think about it, feeds are a bit of an exception, because they're both documents, and, sets of documents (or at least links to them). This is convenient, because, when one subscribes to a feed, they will get updates to that document, but also (and more importantly?), new documents that are added to the feed.

I think we can all agree that discovering new documents is the core of what WebSub was designed for... but right now, outside of Atom/RSS or JSON Activity streams (which are all 'feeds') this is much harder.

For example, if one were to try to use WebSub to disccover newly published AMP documents, they could implement something like this. I am not a big fan of this solution because it does not solve the discoverrability aspect very well. It also means that that the publisher has to create an explicit /latest endpoint that may serve content of different kind based on when it is fetched...

Now, another solution would be to have a another type of discovery link. We currently have 2 in the spec:

What about we add another one called maybe websub which would be the topic to which a subscriber can subscribe to get update and new documents which would belong to it?

For example, a website could just have a <link rel="websub" href="/" /> element to each of its pages to indicate that if one where to subscribe to / they would be notified of any new page (or changes to any page) which points to it? (probably in the domain in this case)?

For a blogging platform which would not have an RSS feed of JSON for its authors, they could easily have a <link rel="websub" href="/stories/julien" /> which would yield any new story published by user julien?

aaronpk commented 7 years ago

What you're describing sounds essentially like a shortcut for an existing mechanism. If you're looking at a blog post permalink, /entry/142857 which indicates it's written by author Julien with a URL of /stories/julien, then you could fetch the page /stories/julien and discover the feed from there or if that page has a hub itself then subscribe to that URL. I would be hesitant to add another mechanism like this unless it's demonstrated that this use case is so common that it's extremely awkward to find the feed using existing mechanisms.

julien51 commented 7 years ago

I don't think it is a shortcut but it is rather a simplification/generalization.

The goal is to allow subscription to 'collections' of resources when such resource (the collection of resources) does not exist. Another way to look at it is how do you subscribe to a directory? HTML is not the most helpful because in practice when somebody requests a "directory", it's the index.html file in there that is being served, rather than the actual list of docs in that directory. Yet, I believe this behavior is missing from websub and adding a rel="websub" would be a way to fix it? Maybe HTTP folks could weigh in?

In the example above, the feed at /stories/julien does not exist, yet, of course, there exists many documents published by julien.

Another example would be to consider how to subscribe such that one would get notifications when new documents are added/updated to a whole website without requiring said website to implement a 'custom' /feed resource (whatever the format was).

emfleury commented 7 years ago

Julien, IIUIC your use case is discovering new documents, rather than learning about changes in existing documents.

As you said, that's essentially a feed, but one that is defined implicitly. If I were to subscribe to the http://www.cnn.com/ as a "virtual topic", what would that mean? What is the set of documents that I would be notified of?

I'm assuming those would be "all documents that are reachable from www.cnn.com/index.html". If that's the case, then why can't the subscribers simply subscribe to changes in cnn.com/index.html itself and then extract all the links?

The way I see it, we have two orthogonal problems here:

Maybe parsing http://www.cnn.com/world is hard and we could have some structured data embedded in the page itself to convey more information about outgoing links. Or maybe CMSs should just publish their Atom or RSS feeds. But that all sounds like application layer stuff, while WebSub is more like the transport layer.

WDYT?

julien51 commented 7 years ago

You are right: the use case I am trying to describe is discovering new documents, rather than learning about changes in existing documents. But I really want to avoid asking the publisher to create some kind of resource which would aggregate link to other resources (because this format would need to be clearly defined).

What is the set of documents that I would be notified of? This is a key question, and I think the right answer would be "all documents which point to the same <link rel="websub" href="/" />. In most cases I think it is very different from the documents linked from the root.

I really want to avoid creating yet another new "format" for agregrate of links. We already have RSS, Atom, Sitemaps.xml... etc. I sincerely believe we could rely on mechanism where each resource points to its "factory", using link headers.

aaronpk commented 7 years ago

Marking this issue to be revisited in a future version of the spec https://www.w3.org/wiki/Socialwg/2017-08-15-minutes#resolution03

sandhawke commented 7 years ago

Closing, with the understanding we'll look for POSTPONED issues in any future effort.