vaskoz / dailycodingproblem-go

problems from dailycodingproblem.com solved in Go
MIT License
95 stars 31 forks source link

Day 432 #881

Open vaskoz opened 4 years ago

vaskoz commented 4 years ago

Good morning! Here's your coding interview problem for today.

This problem was asked by Google.

Design a system to crawl and copy all of Wikipedia using a distributed network of machines.

More specifically, suppose your server has access to a set of client machines. Your client machines can execute code you have written to access Wikipedia pages, download and parse their data, and write the results to a database.

Some questions you may want to consider as part of your solution are:

How will you reach as many pages as possible? How can you keep track of pages that have already been visited? How will you deal with your client machines being blacklisted? How can you update your database when Wikipedia pages are added or updated?

vaskoz commented 4 years ago

Identical to Day 354: https://github.com/vaskoz/dailycodingproblem-go/issues/703