thp / urlwatch

Watch (parts of) webpages and get notified when something changes via e-mail, on your phone or via other means. Highly configurable.
https://thp.io/2008/urlwatch/
Other
2.79k stars 350 forks source link

urlwatch 2.25-1 on Debian Stable 12.5 (navigate fails) #809

Open jpiszcz opened 3 months ago

jpiszcz commented 3 months ago

More websites are requiring javascript to obtain diffs, currently on Debian stable 12.5.

What is the proper way to fix this issue and/or which option is best to track changes in pages that require javascript?

Also logged a bug with Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1067470

$ urlwatch --test-filter 1
Exception while releasing resources for job: <browser navigate='https://support.wyze.com/hc/en-us/articles/360015979872-Service-Status-Known-Issues' name='Wyze Service Status & Known Issues' filter=['html2text', 'striplines']>
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urlwatch/command.py", line 139, in test_filter
    raise job_state.exception
  File "/usr/lib/python3/dist-packages/urlwatch/handler.py", line 68, in __enter__
    self.job.main_thread_enter()
  File "/usr/lib/python3/dist-packages/urlwatch/jobs.py", line 406, in main_thread_enter
    from .browser import BrowserContext
  File "/usr/lib/python3/dist-packages/urlwatch/browser.py", line 42, in <module>
    class BrowserLoop(object):
  File "/usr/lib/python3/dist-packages/urlwatch/browser.py", line 49, in BrowserLoop
    @asyncio.coroutine
     ^^^^^^^^^^^^^^^^^
AttributeError: module 'asyncio' has no attribute 'coroutine'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urlwatch/handler.py", line 78, in __exit__
    self.job.main_thread_exit()
  File "/usr/lib/python3/dist-packages/urlwatch/jobs.py", line 410, in main_thread_exit
    self.ctx.close()
    ^^^^^^^^
AttributeError: 'BrowserJob' object has no attribute 'ctx'
Traceback (most recent call last):
  File "/usr/bin/urlwatch", line 33, in <module>
    sys.exit(load_entry_point('urlwatch==2.25', 'console_scripts', 'urlwatch')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/urlwatch/cli.py", line 112, in main
    urlwatch_command.run()
  File "/usr/lib/python3/dist-packages/urlwatch/command.py", line 431, in run
    self.handle_actions()
  File "/usr/lib/python3/dist-packages/urlwatch/command.py", line 231, in handle_actions
    sys.exit(self.test_filter(self.urlwatch_config.test_filter))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/urlwatch/command.py", line 139, in test_filter
    raise job_state.exception
  File "/usr/lib/python3/dist-packages/urlwatch/handler.py", line 68, in __enter__
    self.job.main_thread_enter()
  File "/usr/lib/python3/dist-packages/urlwatch/jobs.py", line 406, in main_thread_enter
    from .browser import BrowserContext
  File "/usr/lib/python3/dist-packages/urlwatch/browser.py", line 42, in <module>
    class BrowserLoop(object):
  File "/usr/lib/python3/dist-packages/urlwatch/browser.py", line 49, in BrowserLoop
    @asyncio.coroutine
     ^^^^^^^^^^^^^^^^^
AttributeError: module 'asyncio' has no attribute 'coroutine'. Did you mean: 'coroutines'?
mwerlen commented 3 months ago

Hi,

Your problem is linked to the python 3.11 upgrade. This problem has been fixed in urlwatch 2.27 as explained in changelog.

You can either :

jpiszcz commented 3 months ago

Thank you! I pulled the latest urlwatch via github and installed playwright and it seems to work now; although sites that are protected with Cloudflare/CDN, is there an option that can be used to get past this with urlwatch?

$ urlwatch
....
Verifying you are human. This may take a few seconds.support.wyze.com needs to review the security of your connection before proceeding.Verification successfulWaiting for support.wyze.com to respond...Enable JavaScript and cookies to continue
...
This may take a few seconds.camelcamelcamel.com needs to review the security of your connection before proceeding.Verification successfulWaiting for camelcamelcamel.com to respond...
...
Jamstah commented 3 months ago

Waiting is something raised in #763 - it would be good to be able to wait for a specific selector.