Closed knutwannheden closed 2 years ago
It uses If-Modified-Since
(timestamp) and If-None-Match
(etag) for conditional requests. The conditional requests are just an "optimization" (server doesn't need to send the document, urlwatch assumes it has not changed). If etag/timestamp are not available, urlwatch will request the document and compare it to the previous version.
In cases where the webpage will serve slightly different HTML, urlwatch by default will detect this as change every time. However, since this can be a common issue, urlwatch has "filters" which can be used to filter out the always-changing parts, or alternatively, to filter only the part that's interesting (e.g. a div with a certain ID, or whatever -- depending on the page).
I read the documentation rather carefully, but I might still have overlooked it. I am looking for a description of how urlwatch determines that a web page has changed. Looking into the sqlite database I notice that there is support for ETags as well as a timestamp column.
The reason I am asking is that there are webpages with and without ETags and there are also web pages which will serve slightly different HTML for every single request. I would like to understand how urlwatch deals with the different scenarios. Thanks!