Open vringar opened 4 years ago
Also b'["callstacks",{"crawl_id":3815218321,"request_id":37,"call_stack":"\xddE\xf7\xbc\xd1\xb4\x88\x9f0@https://realtime-chart.mofi.xyz/a1352312363x1222897961is/nikkei_a2.js:845:16;null\\nnull@https://realtime-chart.mofi.xyz/a1352312363x1222897961is/nikkei_a2.js:92:11;null","visit_id":586938654453539}]'
breaks us in the SocketInterface
with
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xdd in position 67: invalid continuation byte
bleugh. i hate bugs like this. we’re about to standardize on py3.8 assuming that my conda pr lands so lets double check that’s a problem there - although this is usually a 2 to 3 problem.
but also web data is gross and can just break things.
I suspect the cause is: https://github.com/mozilla/OpenWPM/blob/022e20a56e86636bef6435117af5328b9d79fea4/automation/Extension/firefox/feature.js/callstack-instrument.js#L17.
For most other things we use escapeString
, e.g.:
https://github.com/mozilla/OpenWPM/blob/022e20a56e86636bef6435117af5328b9d79fea4/automation/Extension/webext-instrumentation/src/background/http-instrument.ts#L284-L285
This encodes to utf8: https://github.com/mozilla/OpenWPM/blob/022e20a56e86636bef6435117af5328b9d79fea4/automation/Extension/webext-instrumentation/src/lib/string-utils.ts#L1-L12
This is the traceback I got in the GCP console. Idk why python is assuming ASCII encoding but maybe we can specify that somewhere.