thp / urlwatch

Watch (parts of) webpages and get notified when something changes via e-mail, on your phone or via other means. Highly configurable.
https://thp.io/2008/urlwatch/
Other
2.82k stars 352 forks source link

urlwatch never finish #629

Closed snowman closed 3 years ago

snowman commented 3 years ago
$ urlwatch --verbose 2>&1 | tee /tmp/output
snowman commented 3 years ago

The last cron result:

===========================================================================
01. ERROR: LOL version update
===========================================================================

---------------------------------------------------------------------------
ERROR: LOL version update ( http://lol.qq.com/gicp/news/423/2/1334/1.html )
---------------------------------------------------------------------------
HTTPSConnectionPool(host='lol.qq.com', port=443): Read timed out. (read timeout=8)
---------------------------------------------------------------------------

--
urlwatch 2.22, Copyright 2008-2020 Thomas Perl
Website: https://thp.io/2008/urlwatch/
watched 9 URLs in 1123 seconds

Edited, I found the bug:

import requests

url = "http://lol.qq.com/gicp/news/423/2/1334/1.html"
response = requests.request(url=url, method="GET", timeout=3)

oops, the server could be throttling you?!! https://stackoverflow.com/questions/39227820/requests-process-hangs

Try to add "User-agent" or give up. If you find a solution, don't forget to let me know! :)

name: "LOL: new version"
url: "http://lol.qq.com/gicp/news/423/2/1334/1.html"
encoding: gbk
headers:
  User-agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.25 Safari/537.36 Core/1.70.3756.400 QQBrowser/10.5.4043.400