The following two errors occur frequently:
Version of crawler used: cb164a30758a10a6813e6a29fc13ccc1c952d5f0 and 42cea2dc4dba1e0331ab57eeef491d7413c5570f
Example of pages to craw that present these errors: [ 'https://www.shopify.com', 'https://www.mailchimp.com', 'https://www.xfinity.com', 'https://www.wellsfargo.com', 'https://www.vice.com', 'https://www.gamepedia.com', 'https://www.ltn.com.tw', 'https://www.kompas.com']
BrowserManager - INFO - BROWSER -363943182: EXECUTING COMMAND: ('GET', 'https://www.wellsfargo.com', 10, -3354964916300474)
Exception in thread Thread-197:
Traceback (most recent call last):
File "/Users/coliveira/Desktop/OpenWPM/automation/SocketInterface.py", line 89, in _handle_conn
msg = json.loads(msg.decode('utf-8'))
File "/anaconda3/lib/python3.7/json/__init__.py", line 348, in loads
return _default_decoder.decode(s)
File "/anaconda3/lib/python3.7/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/anaconda3/lib/python3.7/json/decoder.py", line 353, in raw_decode
obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Invalid control character at: line 1 column 789 (char 788)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/anaconda3/lib/python3.7/threading.py", line 917, in _bootstrap_inner
self.run()
File "/anaconda3/lib/python3.7/threading.py", line 865, in run
self._target(*self._args, **self._kwargs)
File "/Users/coliveira/Desktop/OpenWPM/automation/SocketInterface.py", line 98, in _handle_conn
msg, traceback.format_exc(e)))
File "/anaconda3/lib/python3.7/traceback.py", line 167, in format_exc
return "".join(format_exception(*sys.exc_info(), limit=limit, chain=chain))
File "/anaconda3/lib/python3.7/traceback.py", line 121, in format_exception
type(value), value, tb, limit=limit).format(chain=chain))
File "/anaconda3/lib/python3.7/traceback.py", line 508, in __init__
capture_locals=capture_locals)
File "/anaconda3/lib/python3.7/traceback.py", line 337, in extract
if limit >= 0:
TypeError: '>=' not supported between instances of 'JSONDecodeError' and 'int'
And
rowserManager - INFO - BROWSER -363943182: EXECUTING COMMAND: ('GET', 'https://www.vice.com', 10, 2809173210811240)
Exception in thread Thread-194:
Traceback (most recent call last):
File "/Users/coliveira/Desktop/OpenWPM/automation/SocketInterface.py", line 89, in _handle_conn
msg = json.loads(msg.decode('utf-8'))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe1 in position 845: invalid continuation byte
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/anaconda3/lib/python3.7/threading.py", line 917, in _bootstrap_inner
self.run()
File "/anaconda3/lib/python3.7/threading.py", line 865, in run
self._target(*self._args, **self._kwargs)
File "/Users/coliveira/Desktop/OpenWPM/automation/SocketInterface.py", line 98, in _handle_conn
msg, traceback.format_exc(e)))
File "/anaconda3/lib/python3.7/traceback.py", line 167, in format_exc
return "".join(format_exception(*sys.exc_info(), limit=limit, chain=chain))
File "/anaconda3/lib/python3.7/traceback.py", line 121, in format_exception
type(value), value, tb, limit=limit).format(chain=chain))
File "/anaconda3/lib/python3.7/traceback.py", line 508, in __init__
capture_locals=capture_locals)
File "/anaconda3/lib/python3.7/traceback.py", line 337, in extract
if limit >= 0:
TypeError: '>=' not supported between instances of 'UnicodeDecodeError' and 'int'
The following two errors occur frequently: Version of crawler used: cb164a30758a10a6813e6a29fc13ccc1c952d5f0 and 42cea2dc4dba1e0331ab57eeef491d7413c5570f Example of pages to craw that present these errors:
[ 'https://www.shopify.com', 'https://www.mailchimp.com', 'https://www.xfinity.com', 'https://www.wellsfargo.com', 'https://www.vice.com', 'https://www.gamepedia.com', 'https://www.ltn.com.tw', 'https://www.kompas.com']
And