thp / urlwatch

Watch (parts of) webpages and get notified when something changes via e-mail, on your phone or via other means. Highly configurable.
https://thp.io/2008/urlwatch/
Other
2.81k stars 352 forks source link

How to get around client 403 error? #711

Closed kimsin98 closed 2 years ago

kimsin98 commented 2 years ago

A web page I'm watching for weekly updates recently started to block urlwatch with 403 errors. I tried editing the job to provide the request with the same cookie and user-agent as my browser, but I still can't get around the 403. Is there anything else I could try?

name: "ABCD publications"
url: "https://abcdstudy.org/publications/"
cookies:
  {key}: {value}
headers:
  User-Agent: {Mozilla...}
thp commented 2 years ago

It depends on how they block you, User-Agent is only one possible method.

Best is to get in touch with the website author/hoster and figure out a way to handle watching the page that is acceptable to them.

kimsin98 commented 2 years ago

It appears Javascript was required, but pyppeteer + chromium kept getting stuck and crashing. Fortunately, I found a fork named webchanges that uses playwright + chrome, and this worked.