holmes-app / holmes-api

API for holmes website validation.
MIT License
17 stars 7 forks source link

Fix next job to be smarter on who gets what job #84

Closed heynemann closed 10 years ago

heynemann commented 10 years ago

A couple things to keep in mind for this:

a) stop using randomization; b) Restrict the number of workers to the maximum number of connections for a given domain divided by a given number (i.e.: if http://mysite.com has a limit of 2 connections and we decide that each site has in average 10 links, we divide 2 / 10 = 0.2 and round it up. That means that only one worker should be working in http://mysite.com pages at a time); c) Retrieve the next X pages in need of review, where X is the number of workers; d) Use lock in redis to make sure each worker gets an appropriate job. If the lock can't be acquired, try the next page in need of review.