thp / urlwatch

Watch (parts of) webpages and get notified when something changes via e-mail, on your phone or via other means. Highly configurable.
https://thp.io/2008/urlwatch/
Other
2.79k stars 350 forks source link

Question - Report http errors only once #808

Closed electrofloat closed 2 months ago

electrofloat commented 3 months ago

Maybe this is already possible I just don't know how.

Is there a way to set it so when a watched url suddenly disappears, then at first time it reports the 404, then after that only reports when this state (the 404) changes. (either it is back or changed to some other error code, etc.)?

For now it constantly reports 404 for a disappeared url at every run until it comes back.

Jamstah commented 3 months ago

You have the option of ignoring them entirely: https://urlwatch.readthedocs.io/en/latest/advanced.html#ignoring-connection-errors

You can use max_tries to ignore an error for a number of tries: https://urlwatch.readthedocs.io/en/latest/jobs.html#optional-keys-for-all-job-types

The other option is to use a command instead of a url. If you use something like curl to pull the page, then the curl output will switch from the page output to the 404 message, and that will only be reported once. Some hints here: https://urlwatch.readthedocs.io/en/latest/jobs.html#configuring-stderr-behavior-for-shell-jobs

There is an argument for a setting that would allow a user to specify that 404 shouldn't count as an error and should just be reported as normal.

electrofloat commented 3 months ago

404 is specific for this case, but I think there should be a setting for just this: Report the error only once, and the next report should occur only when this changes, whatever the change might be (an other, but different error, or a real diff, or just that the site is accessible again, etc.)

I don't want to ignore the error, I want to know about it. But I don't want to know about it every 5 minutes (assuming urlwatch runs every 5 mintues).

max_retries is something, but really just pushes the problem away a bit.

Using shell could also work (then with whatever the user wants they can use) but it very much complicates watching a simple url.

So imho there should be an inbuilt option for url for the above mentioned issue.

Jamstah commented 3 months ago

Yep, I agree its a good suggestion :)

Jamstah commented 3 months ago

A quick search of the issues board finds this extremely similar request too: https://github.com/thp/urlwatch/issues/286

thp commented 2 months ago

Marking as duplicate of #286, let's continue the discussion there?