openwpm / OpenWPM

A web privacy measurement framework
https://openwpm.readthedocs.io
Other
1.34k stars 314 forks source link

Firefox writes out Unicode which breaks python #653

Open vringar opened 4 years ago

vringar commented 4 years ago

This is the traceback I got in the GCP console. Idk why python is assuming ASCII encoding but maybe we can specify that somewhere.

Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.6/dist-packages/sentry_sdk/integrations/threading.py", line 69, in run
    reraise(*_capture_exception())
  File "/usr/local/lib/python3.6/dist-packages/sentry_sdk/_compat.py", line 57, in reraise
    raise value
  File "/usr/local/lib/python3.6/dist-packages/sentry_sdk/integrations/threading.py", line 67, in run
    return old_run_func(self, *a, **kw)
  File "/opt/OpenWPM/automation/DeployBrowsers/selenium_firefox.py", line 74, in run
    for line in f:
  File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 183: ordinal not in range(128)
vringar commented 4 years ago

Also b'["callstacks",{"crawl_id":3815218321,"request_id":37,"call_stack":"\xddE\xf7\xbc\xd1\xb4\x88\x9f0@https://realtime-chart.mofi.xyz/a1352312363x1222897961is/nikkei_a2.js:845:16;null\\nnull@https://realtime-chart.mofi.xyz/a1352312363x1222897961is/nikkei_a2.js:92:11;null","visit_id":586938654453539}]' breaks us in the SocketInterface with UnicodeDecodeError: 'utf-8' codec can't decode byte 0xdd in position 67: invalid continuation byte

birdsarah commented 4 years ago

bleugh. i hate bugs like this. we’re about to standardize on py3.8 assuming that my conda pr lands so lets double check that’s a problem there - although this is usually a 2 to 3 problem.

birdsarah commented 4 years ago

but also web data is gross and can just break things.

englehardt commented 4 years ago

I suspect the cause is: https://github.com/mozilla/OpenWPM/blob/022e20a56e86636bef6435117af5328b9d79fea4/automation/Extension/firefox/feature.js/callstack-instrument.js#L17.

For most other things we use escapeString, e.g.: https://github.com/mozilla/OpenWPM/blob/022e20a56e86636bef6435117af5328b9d79fea4/automation/Extension/webext-instrumentation/src/background/http-instrument.ts#L284-L285

This encodes to utf8: https://github.com/mozilla/OpenWPM/blob/022e20a56e86636bef6435117af5328b9d79fea4/automation/Extension/webext-instrumentation/src/lib/string-utils.ts#L1-L12