privacy-tech-lab / privacy-pioneer-web-crawler

Web crawler for detecting websites' data collection and sharing practices at scale using Privacy Pioneer
https://privacytechlab.org/
MIT License
1 stars 0 forks source link

Find best amount of time to crawl per site #6

Closed danielgoldelman closed 9 months ago

danielgoldelman commented 11 months ago

We need to establish a cutoff time for how long we crawl each site. @jjeancharles and I will take lead.

danielgoldelman commented 11 months ago

Reminder: we have 2 different timings we need to take into account.

  1. The amount of time the crawler should take per site
  2. The amount of time PP takes to analyze the site data
danielgoldelman commented 9 months ago

We decided in a previous meeting that we would use a cutoff of 30 seconds. We will extend the crawler 10 more seconds than this cutoff to make sure there is enough time for the data to transfer and for any problems to occur before transitioning to the next url.