Thinking about identifiable and unidentifiable items leads me to the question whether it would also be possible to build a scraper that is "less intelligent" and only looks for changes of any kind that have been made to a monitored website. And that delivers only a message like "there has been a change to this website, so go and have a look" or, a bit more intelligtent, like "there has been a change to this website, see the comparion of the "before" and the "after" here". That's what page2rss did, roughly.
The CachingHttpFetcher caches HTTP responses for two hours at the moment, this could, in principle be used to detect any changes and create a feed of diffs.
This requires some more work because the system has no notion of page state at the moment.
@vossviola in our gitter:
The CachingHttpFetcher caches HTTP responses for two hours at the moment, this could, in principle be used to detect any changes and create a feed of diffs.
This requires some more work because the system has no notion of page state at the moment.