kba / rssscrpr

Scrape web content to RSS feeds
https://rssscrpr.herokuapp.com/
MIT License
1 stars 2 forks source link

CacheDiffScraper / CacheDiffFetcher - notify of changes in a page #18

Open kba opened 8 years ago

kba commented 8 years ago

@vossviola in our gitter:

Thinking about identifiable and unidentifiable items leads me to the question whether it would also be possible to build a scraper that is "less intelligent" and only looks for changes of any kind that have been made to a monitored website. And that delivers only a message like "there has been a change to this website, so go and have a look" or, a bit more intelligtent, like "there has been a change to this website, see the comparion of the "before" and the "after" here". That's what page2rss did, roughly.

The CachingHttpFetcher caches HTTP responses for two hours at the moment, this could, in principle be used to detect any changes and create a feed of diffs.

This requires some more work because the system has no notion of page state at the moment.