Open SebastianZimmeck opened 1 month ago
Once the testing is done (#9), we create a release (#23), and start the crawl per this issue.
The countries to crawl are:
The top 525 sites for each country are listed in this repo.
In addition to each countries' top 525 sites we also crawl the United States top 525 list for each country as a general list.
This will lead to 19*525 = 9,975 crawled sites in total.
@atlasharry identified the Google Clould VMs for each country:
@atlasharry will take the lead on the crawl and organize how others can help (however it makes sense).
Once the testing is done (#9), we create a release (#23), and start the crawl per this issue.
1. Countries
The countries to crawl are:
2. Sites
The top 525 sites for each country are listed in this repo.
In addition to each countries' top 525 sites we also crawl the United States top 525 list for each country as a general list.
This will lead to 19*525 = 9,975 crawled sites in total.
3. Google Cloud VM
@atlasharry identified the Google Clould VMs for each country:
@atlasharry will take the lead on the crawl and organize how others can help (however it makes sense).