thp / urlwatch

Watch (parts of) webpages and get notified when something changes via e-mail, on your phone or via other means. Highly configurable.
https://thp.io/2008/urlwatch/
Other
2.84k stars 349 forks source link

HTTP Error 522 not caught with max_tries ? #668

Open kongomongo opened 3 years ago

kongomongo commented 3 years ago

Hi there,

I thought max_tries was more or less a catchall for any error. So if my interval for urlwatch is every 5 mins and max_tries is 12, no matter the error if it vanishes within 60 mins I get no error.

Or so i thought.

Can you explain this?

urlwatch -v --test-filter 1
2021-08-30 21:51:14,647 cli INFO: turning on verbose logging mode
2021-08-30 21:51:14,705 minidb DEBUG: PRAGMA table_info(CacheEntry)
2021-08-30 21:51:17,751 main INFO: Using /root/.config/urlwatch/urls.yaml as URLs file
2021-08-30 21:51:17,751 main INFO: Using /root/.config/urlwatch/hooks.py for hooks
2021-08-30 21:51:17,752 main INFO: Using /root/.cache/urlwatch/cache.db as cache database
2021-08-30 21:51:17,752 util INFO: Registering <class 'hooks.AllKeyShopTop'> as akstop
2021-08-30 21:51:17,752 util INFO: Registering <class 'hooks.RegexSubUpper'> as re.sub.upper
2021-08-30 21:51:17,850 main INFO: Found 25 jobs
2021-08-30 21:51:17,850 handler INFO: Processing: <url url='https://xxx.yy/forum/register.php?' ignore_cached=True headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36 OPR/71.0.3770.287'} ignore_http_error_codes=522 name='xxx.yy' filter=[{'element-by-tag': 'body'}, {'html2text': {'method': 'lynx'}}, {'re.sub': {'pattern': '(?i)(ist jetzt )(..:..)( Uhr)', 'repl': '\\1XX:XX\\3'}}, {'re.sub': {'pattern': '(?i)(Es ist: )(..-..-...., ..:..)', 'repl': '\\1XX-XX-XXXX, XX:XX'}}, {'re.sub': {'pattern': '(?m)(\\.php\\?+s=)[a-f0-9]{8,}([^a-f0-9])', 'repl': '\\1PX_IGNORED\\2'}}, 'strip'] max_tries=12 treat_new_as_changed=True>
2021-08-30 21:51:17,850 minidb DEBUG: SELECT data, timestamp, tries, etag FROM CacheEntry WHERE guid = ? ORDER BY timestamp DESC, tries DESC LIMIT ? ['1879761a4956f0fd90d855d1c05d8b35abff8cee', 1]
2021-08-30 21:51:17,975 connectionpool DEBUG: Starting new HTTPS connection (1): xxx.yy:443
2021-08-30 21:51:48,876 connectionpool DEBUG: https://xxx.yy:443 "GET /forum/register.php HTTP/1.1" 522 None
Traceback (most recent call last):
  File "/usr/local/bin/urlwatch", line 10, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/dist-packages/urlwatch/cli.py", line 112, in main
    urlwatch_command.run()
  File "/usr/local/lib/python3.7/dist-packages/urlwatch/command.py", line 408, in run
    self.handle_actions()
  File "/usr/local/lib/python3.7/dist-packages/urlwatch/command.py", line 210, in handle_actions
    sys.exit(self.test_filter(self.urlwatch_config.test_filter))
  File "/usr/local/lib/python3.7/dist-packages/urlwatch/command.py", line 138, in test_filter
    raise job_state.exception
  File "/usr/local/lib/python3.7/dist-packages/urlwatch/handler.py", line 113, in process
    data = self.job.retrieve(self)
  File "/usr/local/lib/python3.7/dist-packages/urlwatch/jobs.py", line 292, in retrieve
    response.raise_for_status()
  File "/usr/local/lib/python3.7/dist-packages/requests/models.py", line 943, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 522 Server Error:  for url: https://xxx.yy/forum/register.php

Am I doing it wrong?

kongomongo commented 3 years ago

even adding ignore_http_error_codes: 522 does not help

thp commented 2 years ago

--test-filter won't work on max tries. Try running urlwatch with --verbose in your cron job and check the output. It should be quite verbose regarding max_tries ("Using max_tries of ...", "Error while executing...", "This was try ... of ...", "We are not at ... tries", ...).