interview-preparation / what-we-do

0 stars 8 forks source link

[System Design and Scalability] interview questions #3 #122

Closed rygh4775 closed 5 years ago

rygh4775 commented 5 years ago

Web Crawler: If you were designing a web crawler, how would you avoid getting into infinite loops?

rygh4775 commented 5 years ago

How an infinite loop might occur

Definitions

Solution

  1. Open up the page and create a signature
  2. Query the database to see whether anything with this signature has been crawled recently.
  3. If something with this signature has been recently crawled, insert this page back int to the database at low priority.
  4. If not, crawl the page and insert its links into the database.