passiomatic / coldsweat

Web RSS aggregator and reader compatible with the Fever API
MIT License
146 stars 21 forks source link

Implement RSS/Atom feed autodiscovery #6

Closed passiomatic closed 9 years ago

passiomatic commented 11 years ago

While adding a feed via web let user to specify site homepage and figure out RSS feed via autodiscovery. This is an usability boost, since sometimes it's difficult to figure out if (and where) a site exposes RSS feeds to syndicate its contents.

Autodiscovery UI is not straightforward to implement, since it has to include corner cases. The three scenarios are:

A. Web page with one feed link

  1. User copies into the location feed a web page address
  2. Coldsweat issues a GET and sniffs the page contents. The sniff routine determines that it is an actual web page and scans it looking for a relevant RSS link. If the content is a feed go to case C step 2.
  3. Coldsweat find one feed link, adds the link to the feeds collection and fetch it.

    B. Web page with more than one feed link

  4. User copy into the location feed a web page address
  5. Coldsweat issues a GET and sniffs the page contents. The sniff routine determines that it is an actual web page and scans it looking for a relevant RSS link. If the content is a feed go to case C step 2.
  6. Coldsweat find more than one feed link, it shows the various feeds found to the user and allow to select one link (or more?), adds the link to the feed collection and fetch it.

    C. Valid feed link (current implementation)

  7. User copy into the location feed a feed link address
  8. Coldsweat adds the link to the feed collection and fetch it.

In all three scenarios above Coldsweat needs to halts the procedure if a broken (or gone) resource is encountered.

References

Pablo2m commented 10 years ago

To scan the pages looking at the feeds are some very good libraries. https://github.com/ftzeng/feedfinder https://github.com/gosusnp/url2feed https://github.com/kaflesudip/grabfeed https://github.com/papaeye/feedsearcher-py https://github.com/dfm/feedfinder2

passiomatic commented 10 years ago

Thank you for the links. Coldsweat has already a modified version of the orginal feedfinder. It's very basic since it looks only for <link> tags, but it works.

The real reason that caused to delay the discovery implementation is that fetcher.fetch_feed is monolitc and quite messy. A refactor is in progress, but there's no ETA yet.

ahknight commented 10 years ago

A big +1 on this one, FWIW.

passiomatic commented 9 years ago

For all the people interested I've commited the feed autodiscovery code on the 0.9.5-wip. Now from the subscribe modal you can enter a feed URL, a web page URL or a domain.com shorthand.

The discovery routine it's pretty basic since it doesn't scan the entire page for potential feeds but only looks for <link> references - but should cover most of the use cases.

One thing I currently left out but I intend to implement in the near future is a "just subscribe" behavior if just one feed has been found on the page. Actually Coldsweat asks for confirmation even if there's only one available feed.

Anyway you are encouraged to test autodiscovery out.