openwpm / OpenWPM

A web privacy measurement framework
https://openwpm.readthedocs.io
Other
1.34k stars 314 forks source link

TypeError: '>=' not supported between instances of ____ and 'int' #407

Closed aliamcami closed 5 years ago

aliamcami commented 5 years ago

The following two errors occur frequently: Version of crawler used: cb164a30758a10a6813e6a29fc13ccc1c952d5f0 and 42cea2dc4dba1e0331ab57eeef491d7413c5570f Example of pages to craw that present these errors: [ 'https://www.shopify.com', 'https://www.mailchimp.com', 'https://www.xfinity.com', 'https://www.wellsfargo.com', 'https://www.vice.com', 'https://www.gamepedia.com', 'https://www.ltn.com.tw', 'https://www.kompas.com']

BrowserManager       - INFO     - BROWSER -363943182: EXECUTING COMMAND: ('GET', 'https://www.wellsfargo.com', 10, -3354964916300474)
Exception in thread Thread-197:
Traceback (most recent call last):
  File "/Users/coliveira/Desktop/OpenWPM/automation/SocketInterface.py", line 89, in _handle_conn
    msg = json.loads(msg.decode('utf-8'))
  File "/anaconda3/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/anaconda3/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/anaconda3/lib/python3.7/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Invalid control character at: line 1 column 789 (char 788)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/anaconda3/lib/python3.7/threading.py", line 917, in _bootstrap_inner
    self.run()
  File "/anaconda3/lib/python3.7/threading.py", line 865, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/coliveira/Desktop/OpenWPM/automation/SocketInterface.py", line 98, in _handle_conn
    msg, traceback.format_exc(e)))
  File "/anaconda3/lib/python3.7/traceback.py", line 167, in format_exc
    return "".join(format_exception(*sys.exc_info(), limit=limit, chain=chain))
  File "/anaconda3/lib/python3.7/traceback.py", line 121, in format_exception
    type(value), value, tb, limit=limit).format(chain=chain))
  File "/anaconda3/lib/python3.7/traceback.py", line 508, in __init__
    capture_locals=capture_locals)
  File "/anaconda3/lib/python3.7/traceback.py", line 337, in extract
    if limit >= 0:
TypeError: '>=' not supported between instances of 'JSONDecodeError' and 'int'

And

rowserManager       - INFO     - BROWSER -363943182: EXECUTING COMMAND: ('GET', 'https://www.vice.com', 10, 2809173210811240)
Exception in thread Thread-194:
Traceback (most recent call last):
  File "/Users/coliveira/Desktop/OpenWPM/automation/SocketInterface.py", line 89, in _handle_conn
    msg = json.loads(msg.decode('utf-8'))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe1 in position 845: invalid continuation byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/anaconda3/lib/python3.7/threading.py", line 917, in _bootstrap_inner
    self.run()
  File "/anaconda3/lib/python3.7/threading.py", line 865, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/coliveira/Desktop/OpenWPM/automation/SocketInterface.py", line 98, in _handle_conn
    msg, traceback.format_exc(e)))
  File "/anaconda3/lib/python3.7/traceback.py", line 167, in format_exc
    return "".join(format_exception(*sys.exc_info(), limit=limit, chain=chain))
  File "/anaconda3/lib/python3.7/traceback.py", line 121, in format_exception
    type(value), value, tb, limit=limit).format(chain=chain))
  File "/anaconda3/lib/python3.7/traceback.py", line 508, in __init__
    capture_locals=capture_locals)
  File "/anaconda3/lib/python3.7/traceback.py", line 337, in extract
    if limit >= 0:
TypeError: '>=' not supported between instances of 'UnicodeDecodeError' and 'int'
englehardt commented 5 years ago

Related to #255

englehardt commented 5 years ago

Confirmed fix in https://github.com/mozilla/OpenWPM/pull/442#issuecomment-518877860