opinionated / scheduler

0 stars 0 forks source link

throttle requests #2

Open ConnorFoody opened 9 years ago

ConnorFoody commented 9 years ago

Issue also mentioned in scraper here.

We should make request timing appear more human and make sure we don't ping a site too much. We can do both by having the scheduler modify the schedules and take the provided times as suggested priorities. Potential example of schedule change:

{0, 5, 10, 15, 20} --> {0, 3, 10, 11, 20}

Should be separated out from the actual scheduler.

ConnorFoody commented 8 years ago

Sam did the client side throttle, someone still needs to do the server side throttle (this may be more difficult)

ConnorFoody commented 8 years ago

Someone will need to add a "get target" to the schedulable

ConnorFoody commented 8 years ago

I think I wrote down the name of the algo that does this, but I don't remember the name. This seems somewhat related.

I don't think we will need to roll our own algo (or at least the bulk of the thinking), but will probably need our own code

More relevant wikipedia page here

ConnorFoody commented 8 years ago

Could we do a multi-pass greedy algo?

first pass: needed density = approx num articles from rss / unit time num articles, time range of articles

2nd pass: make everything standard ie {1,2,3,4} with time frame 10 --> {1, 4, 7, 10}

3rd pass clump and randomize according to some "humanness" stat. {1, 4, 7, 10} --> {1, 3, 4, 10}

Not sure how good a result we could get here, or how we would "fix" a bad stat