openwpm / OpenWPM

A web privacy measurement framework
https://openwpm.readthedocs.io
Other
1.34k stars 315 forks source link

Failure to save data for some websites #889

Closed vringar closed 3 years ago

vringar commented 3 years ago

taehyung222 said in Matrix:

Hi, I observed that among the sites (from a list) that do not show an output in OpenWPM, a section of them do show an output if I query for just that one website. For example ("http://zhanqi.tv", "http://rednet.cn", "http://shutterstock.com", "http://cnnic.cn", "http://skype.com", "https://typeform.com", "http://cdc.gov").

I would be grateful to know on what could be done to be able to get output from a greater proportion of the sites in a given list.

However when running just these sites in isolation I seemed to be getting structured data for them. This needs further investigation.

This we need to know about:

shivani-1521 commented 3 years ago

OpenWPM version : v0.13.0 On Ubuntu 20.04.1 LTS with Python 3.8.5 and Conda 4.9.2 However, it appears this could be a false alarm as there were timeout errors, which could be due to my network.

vringar commented 3 years ago

Okay, v0.13.0 didn't have any changes to the data storing. Timeouts are unfortunately very common when running crawls. Consider having a look at openwpm-utils for analysis tools we use when analyzing our crawls