Closed digitarald closed 2 years ago
@zalun can you take a look as well?
httparchive
sample queries: https://www.igvita.com/2013/06/20/http-archive-bigquery-web-performance-answers/
httparchive
… The HTTP Archive provides this record. It is a permanent repository of web performance information such as size of pages, failed requests, and technologies utilized.
It is just request/response details and WebPagetest results on purpose.
Looks like they are both not a good fit, not even giving us an edge in priming the database.
There's also http://webdevdata.org/, but it's just HTTP requests and HTML responses for home pages.
On top of that debdevdata.org just has about 8000 entries.
On top of that debdevdata.org just has about 8000 entries.
That's ~80,000. Law of diminishing returns kicks in pretty quick after that. Admittedly, our (webdevdata's) dataset is grossly out of date now - but a new run could be performed. I produced the following report on iOS "PWAs" previously from that old dataset.
Skimmed over both to assess what kind of data they could provide:
https://commoncrawl.org/
http://httparchive.org/
We need to do some more research, but httparchive might be good alternative to Prowac crawling sites.