internetarchive / brozzler

brozzler - distributed browser-based web crawler
Apache License 2.0
669 stars 97 forks source link

Upgrade websocket-client dependency #281

Closed vbanos closed 3 weeks ago

vbanos commented 3 months ago

We use a very old version (0.47.0 was released on Feb 22, 2018). This project is actively developed so it would be great to upgrade. Changelog: https://github.com/websocket-client/websocket-client/blob/master/ChangeLog

I tested every version from 0.48.0 to 0.59.0. Only versions 0.52.0, 0.58.0 and 0.59.0 worked correctly. All others raised exceptions. Thus, I suggest to use 0.59.0 for now.

vbanos commented 3 months ago

Furthermore, I tested version 1.0.0 to 1.4.3. Captures worked but in the end, all raised the following exception:

[2024-07-28 10:51:06,819: ERROR/ForkPoolWorker-2] error from callback <bound method WebsockReceiverThread._on_close of <WebsockReceiverThread(WebsockThread:47001, started daemon 139974406829824)>>: _on_close() takes 2 positional arguments but 4 were given
[2024-07-28 10:51:06,819: ERROR/ForkPoolWorker-2] exception from websocket receiver thread
Traceback (most recent call last):                                              
  File "/opt/spn/lib/python3.8/site-packages/websocket/_app.py", line 407, in _callback
    callback(self, *args)                                                       
TypeError: _on_close() takes 2 positional arguments but 4 were given            

We need to change Brozzler code to use version 1+.

So, I recommend to upgrade to 0.59.0 as a first step and then later make the necessary code changes to upgrade even more. Thank you.

vbanos commented 3 months ago

Also, the new websocket client will be needed to use the latest Chrome.

vbanos commented 3 weeks ago

Already done at https://github.com/internetarchive/brozzler/commit/b41393fac5353fe01154c90fc54ee532e149ec59