Closed asajeffrey closed 5 years ago
I think it's better to write an auto script to get archives from Alexa 500.
The problem is that there's a bit of hand-curating involved, I check the archives by eye before committing them. At some point we might want to scale up to full automation, but I'm not sure we're quite there yet.
It is also a good choice to use moz's top 500.
Alexa Internet creates a list of the top 1,000,000 sites on the web. It's updated daily. http://s3.amazonaws.com/alexa-static/top-1m.csv.zip
Hello! I can work on some of these. I can start with #40 and #42.
FWIW, PRs have been submitted for all the domains listed here.
Yes, I built up a bit of a backlog there, but this is now done. Thanks everyone!
The Alexa top web sites are at: https://www.alexa.com/topsites. Once #8 lands, we'll be tracking the top 10, it would be nice to have the top 25. The missing ones are:
There are instructions for playing and recording web archives for Servo at https://github.com/servo/servo-warc-tests/blob/master/README.md.
The list of archives used for Servo performance testing is at https://github.com/servo/servo-warc-tests/blob/master/ARCHIVES
Please help out by recording web archives for us!
You can do that by going to one of the issues, and assigning yourself.