openwpm / OpenWPM

A web privacy measurement framework
https://openwpm.readthedocs.io
Other
1.34k stars 314 forks source link

UnicodeDecodeError: 'utf8' codec can't decode byte 0xea in position 522: invalid continuation byte #403

Closed motin closed 5 years ago

motin commented 5 years ago

There seem to be some issue related to serializing and deserializing JSON between the instrumentation and the platform.

Log output:

BrowserManager       - INFO     - BROWSER -1312820091: EXECUTING COMMAND: ('GET', u'http://gamepedia.com', 10, -786562996376282L)
Error de-serializing message: ["http_requests",{"incognito":0,"crawl_id":-1312820091,"extension_session_uuid":"f04eefa6-30e4-4139-9074-e55ed8f5b076","event_ordinal":640,"window_id":1,"tab_id":1,"frame_id":4294967303,"request_id":"205","url":"https://www.facebook.com/tr/","method":"POST","time_stamp":"2019-07-15T18:42:11.176Z","referrer":"https://www.ltn.com.tw/","post_body":{"id":["124629834835104"],"ev":["Microdata"],"dl":["https://www.ltn.com.tw/"],"rl":[""],"if":["false"],"ts":["1563216130961"],"cd[DataLayer]":["[]"],"cd[Meta]":["{\"title\":\"�1B1�P1\",\"meta:description\":\"�1B1���Л́��^
oЛ,Kh0sB��|��1��˾����^�MB�
�B��^�^@
��'��\",\"meta:keywords\":\"�1B1, �1�P1, �1B1�P1, Liberty Times Net, LTN\"}"],"cd[OpenGraph]":["{\"og:title\":\"�1B1�P1\",\"og:type\":\"index\",\"og:url\":\"http://www.ltn.com.tw\",\"og:description\":\"�1B1���Л́��^
oЛ,Kh0sB��|��1��˾����^�MB�
�B��^�^@
��'��\",\"og:image\":\"assets/images/250_ltn.png\"}"],"cd[Schema.org]":["[]"],"cd[JSON-LD]":["[]"],"sw":["1366"],"sh":["768"],"v":["2.8.51"],"r":["stable"],"ec":["1"],"o":["30"],"fbp":["fb.2.1563216129718.524683915"],"it":["1563216128994"],"coo":["false"],"es":["automatic"],"rqm":["formPOST"]},"headers":"[[\"Host\",\"www.facebook.com\"],[\"User-Agent\",\"Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0\"],[\"Accept\",\"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\"],[\"Accept-Language\",\"en-US,en;q=0.5\"],[\"Accept-Encoding\",\"gzip, deflate, br\"],[\"Referer\",\"https://www.ltn.com.tw/\"],[\"Content-Type\",\"application/x-www-form-urlencoded\"],[\"Content-Length\",\"2236\"],[\"Connection\",\"keep-alive\"],[\"Cookie\",\"fr=0JxRdXKB00NxcvM39..BdLMkB...1.0.BdLMkB.\"],[\"Upgrade-Insecure-Requests\",\"1\"]]","is_XHR":0,"is_full_page":0,"is_frame_load":1,"triggering_origin":"https://www.ltn.com.tw","loading_origin":"https://www.ltn.com.tw","loading_href":"https://www.ltn.com.tw/","resource_type":"sub_frame","top_level_url":"https://www.ltn.com.tw/","parent_frame_id":0,"frame_ancestors":"[{\"frameId\":0,\"url\":\"https://www.ltn.com.tw/\"}]","visit_id":-1446733757290209}] 
 Traceback (most recent call last):
  File "/opt/OpenWPM/automation/SocketInterface.py", line 89, in _handle_conn
    msg = json.loads(msg.decode('utf-8'))
  File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xea in position 522: invalid continuation byte
Error de-serializing message: ["http_requests",{"incognito":0,"crawl_id":-1312820091,"extension_session_uuid":"c52d3df5-72ff-4792-b0c9-f8e60bb932d9","event_ordinal":320,"window_id":1,"tab_id":1,"frame_id":4294967305,"request_id":"88","url":"https://www.facebook.com/tr/","method":"POST","time_stamp":"2019-07-15T18:44:37.694Z","referrer":"https://www.gamepedia.com/","post_body":{"id":["1223056687740425"],"ev":["Microdata"],"dl":["https://www.gamepedia.com/"],"rl":[""],"if":["false"],"ts":["1563216277448"],"cd[DataLayer]":["[]"],"cd[Meta]":["{\"title\":\"Gamepedia \",\"meta:description\":\"Explore our wiki library, discover upcoming indie titles, and watch video tutorials that help you Know the Game.\"}"],"cd[OpenGraph]":["{\"og:description\":\"Explore our wiki library, discover upcoming indie titles, and watch video tutorials that help you Know the Game.\",\"og:image\":\"https://mercury-media.cursecdn.com/avatars/67/772/636923232549249224.jpeg\",\"og:locale\":\"en_US\",\"og:type\":\"website\",\"og:title\":\"Gamepedia\",\"og:url\":\"https://www.gamepedia.com/\",\"og:site_name\":\"Gamepedia\"}"],"cd[Schema.org]":["[{\"dimensions\":{\"h\":657,\"w\":467},\"properties\":{\"url\":\"http://www.gamepedia.com/blogs/1969-all-about-auto-chess-a-starter-guide-to-gamings\",\"datePublished\":\"\",\"interactionCount\":\"UserComments:0\",\"articleBody\":\"If youve glanced at the most played games on Twitch in the past few months, youve likely noticed a few unfamiliar titles rising to the top of the ranks. Team Fight Tactics? Dota Underlords? What is all this stuff? In short: its auto chess, and its gamings newest genre.\\n\\n�\\n\\nIf you actually attempted to watch a stream of one of the above games being played, chances are you became even more confused. At any given moment theres a ton of action happening on screen at once, but no clear reason a\"},\"subscopes\":[{\"dimensions\":{\"h\":0,\"w\":0},\"properties\":{\"name\":\"comradekoch\"},\"subscopes\":[],\"type\":\"http://schema.org/Person\"}],\"type\":\"http://schema.org/Article\"},{\"dimensions\":{\"h\":540,\"w\":467},\"properties\":{\"url\":\"http://www.gamepedia.com/news/1966-a-whos-who-of-twitch-streaming-celebrity-will\",\"datePublished\":\"\",\"interactionCount\":\"UserComments:0\",\"articleBody\":\"Next week, Riot sponsors their first ever tournament for their month-old�League of Legends game mode, Teamfight Tactics. The Teamfight Tactics Showdown will take place between July 17-18, and gather 64 popular streamers to duke it out for an opportunity at a piece of a $125,000 prize pool.\"},\"subscopes\":[{\"dimensions\":{\"h\":0,\"w\":0},\"properties\":{\"name\":\"Jarrettjawn\"},\"subscopes\":[],\"type\":\"http://schema.org/Person\"}],\"type\":\"http://schema.org/Article\"},{\"dimensions\":{\"h\":479,\"w\":467},\"properties\":{\"url\":\"http://www.gamepedia.com/news/1967-experience-anarchy-in-streets-of-rogue\",\"datePublished\":\"\",\"interactionCount\":\"UserComments:0\",\"articleBody\":\"Streets of Rogue officially leaves early access today! Developed by Matt Dabrowski and published by tinyBuild, Streets of Rogue is a rogue-lite meets immersive sim where players will fight, sneak, and hack their way through randomly generated cities.�\"},\"subscopes\":[{\"dimensions\":{\"h\":0,\"w\":0},\"properties\":{\"name\":\"OSWguild\"},\"subscopes\":[],\"type\":\"http://schema.org/Person\"}],\"type\":\"http://schema.org/Article\"},{\"dimensions\":{\"h\":531,\"w\":467},\"properties\":{\"url\":\"http://www.gamepedia.com/news/1965-conquer-ancient-italia-in-field-of-glory-empires\",\"datePublished\":\"\",\"interactionCount\":\"UserComments:0\",\"articleBody\":\"AGEOD teams up with another iconic name in strategy games, Slitherine, for Field of Glory: Empires (FOGE). This spin-off should not be confused as a sequel to the main Field of Glory series. Empires is its own more grand, high level take on battles in 300BC. As the Roman Republic, your goal is to conquer Italia through military might and cunning.\"},\"subscopes\":[{\"dimensions\":{\"h\":0,\"w\":0},\"properties\":{\"name\":\"Jarrettjawn\"},\"subscopes\":[],\"type\":\"http://schema.org/Person\"}],\"type\":\"http://schema.org/Article\"},{\"dimensions\":{\"h\":500,\"w\":467},\"properties\":{\"url\":\"http://www.gamepedia.com/news/1964-prepare-to-ascend-to-godhood\",\"datePublished\":\"\",\"interactionCount\":\"UserComments:0\",\"articleBody\":\"Everyone has moments when they wish they were an omnipotent god capable of ruling the world as they see fit. The god game genre exists for those of us whose desire is a little more persistent, giving us an opportunity to stretch the divine muscle and lord over believers. The aptly named Godhood is heir to this long genre, stretching back a good three decades.\"},\"subscopes\":[{\"dimensions\":{\"h\":0,\"w\":0},\"properties\":{\"name\":\"Tagaziel\"},\"subscopes\":[],\"type\":\"http://schema.org/Person\"}],\"type\":\"http://schema.org/Article\"},{\"dimensions\":{\"h\":603,\"w\":467},\"properties\":{\"url\":\"http://www.gamepedia.com/blogs/1962-weekly-official-wiki-roundup-demonsarecrazy-war\",\"datePublished\":\"\",\"interactionCount\":\"UserComments:0\",\"articleBody\":\"What's up, faithful Gamepedia readers? How have you been doing? What games have you been playing recently? That's cool! Well, we hate to have to tear you away from your favorite titles this week, but we've got another installment of your absolute favorite weekly publication here at the old site: your Weekly Official Wiki Roundup! Whether you're into demonic games or flying all over the place with dragons, we've got something special for you today. As a wise man once said, stay awhile and listen.\"},\"subscopes\":[{\"dimensions\":{\"h\":0,\"w\":0},\"properties\":{\"name\":\"MolotovCupcake\"},\"subscopes\":[],\"type\":\"http://schema.org/Person\"}],\"type\":\"http://schema.org/Article\"},{\"dimensions\":{\"h\":510,\"w\":467},\"properties\":{\"url\":\"http://www.gamepedia.com/news/1963-guardiancon-begins-today-has-raised-over-3-8m-for\",\"datePublished\":\"\",\"interactionCount\":\"UserComments:0\",\"articleBody\":\"GuardianCon begins today, but it has already had a huge impact on the gaming world by raising over $3.8 million for charity! GuardianCon was started by several large streamers with an original focus on Destiny, but has since grown to be a massive gathering of gamers!\"},\"subscopes\":[{\"dimensions\":{\"h\":0,\"w\":0},\"properties\":{\"name\":\"OSWguild\"},\"subscopes\":[],\"type\":\"http://schema.org/Person\"}],\"type\":\"http://schema.org/Article\"},{\"dimensions\":{\"h\":571,\"w\":467},\"properties\":{\"url\":\"http://www.gamepedia.com/news/1961-master-voxelated-logistics-in-kubifaktorium\",\"datePublished\":\"\",\"interactionCount\":\"UserComments:0\",\"articleBody\":\"There is an interesting sub-genre arising out of the development sim space that puts automation front and center. The goal isn't just to harvest resources and expand, but to do so as efficiently as possible. At peak play, these sorts of games involved developing expansive supply chains that resemble Rube Goldberg machines on steroids. It would almost be relaxing, if it wasn't also so overwhelming. Kubifaktorium, the latest from developer Mirko Seithe, at least makes it look less threatening.\"},\"subscopes\":[{\"dimensions\":{\"h\":0,\"w\":0},\"properties\":{\"name\":\"Jarrettjawn\"},\"subscopes\":[],\"type\":\"http://schema.org/Person\"}],\"type\":\"http://schema.org/Article\"},{\"dimensions\":{\"h\":510,\"w\":467},\"properties\":{\"url\":\"http://www.gamepedia.com/blogs/1958-here-are-the-biggest-releases-this-summer\",\"datePublished\":\"\",\"interactionCount\":\"UserComments:0\",\"articleBody\":\"The June through September time-frame has notoriously been rough for video game releases. While this period of time has become less of a drought in recent years, there are still only a handful of big releases to look forward to this summer in the northern hemisphere. Here is what you can look forward to play!\"},\"subscopes\":[{\"dimensions\":{\"h\":0,\"w\":0},\"properties\":{\"name\":\"OSWguild\"},\"subscopes\":[],\"type\":\"http://schema.org/Person\"}],\"type\":\"http://schema.org/Article\"},{\"dimensions\":{\"h\":510,\"w\":467},\"properties\":{\"url\":\"http://www.gamepedia.com/news/1960-games-done-quick-raises-over-3-million-with-speed\",\"datePublished\":\"\",\"interactionCount\":\"UserComments:0\",\"articleBody\":\"Summer Games Done Quick (SGDQ) has wrapped up after a week of around-the-clock streaming! There were 139 speed runs of games during the course of the week-long charity event that happens every year, and this year set a new record regarding the amount raised for charity!\"},\"subscopes\":[{\"dimensions\":{\"h\":0,\"w\":0},\"properties\":{\"name\":\"OSWguild\"},\"subscopes\":[],\"type\":\"http://schema.org/Person\"}],\"type\":\"http://schema.org/Article\"}]"],"cd[JSON-LD]":["[]"],"sw":["1366"],"sh":["768"],"v":["2.8.51"],"r":["stable"],"ec":["1"],"o":["30"],"fbp":["fb.1.1563216276905.1807542555"],"it":["1563216276410"],"coo":["false"],"es":["automatic"],"rqm":["formPOST"]},"headers":"[[\"Host\",\"www.facebook.com\"],[\"User-Agent\",\"Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0\"],[\"Accept\",\"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\"],[\"Accept-Language\",\"en-US,en;q=0.5\"],[\"Accept-Encoding\",\"gzip, deflate, br\"],[\"Referer\",\"https://www.gamepedia.com/\"],[\"Content-Type\",\"application/x-www-form-urlencoded\"],[\"Content-Length\",\"10622\"],[\"Connection\",\"keep-alive\"],[\"Cookie\",\"fr=0PmXj4nRb91Ekm0gt..BdLMmV...1.0.BdLMmV.\"],[\"Upgrade-Insecure-Requests\",\"1\"]]","is_XHR":0,"is_full_page":0,"is_frame_load":1,"triggering_origin":"https:/BrowserManager       - INFO     - BROWSER -1312820091: BrowserManager restart initiated. Clear profile? True
No handlers could be found for logger "pyvirtualdisplay.abstractdisplay"
englehardt commented 5 years ago

Related to #255

englehardt commented 5 years ago

Confirmed fix in https://github.com/mozilla/OpenWPM/pull/442#issuecomment-518877860