Closed waynerobinson closed 6 years ago
It only starts at the top (initial_queue
) when the queue itself is empty.
https://github.com/sergiotapia/magnetissimo/blob/master/lib/crawler/thepiratebay.ex#L29
If it has items in the queue, it processes those items first. And that 5 seconds comment is outdated. Shame! 🔔 I need to update it to reflect the real time between processing.
I meant, does it just loop round and round the pages without a larger break in-between? The queue contains a list of pages to scrape for torrents correct?
If this server was to just run in the background, wouldn't it be attempting to download a page every 100ms for each of the sites during the entire time its operating?
Hey @waynerobinson circling back to this ticket, I changed the way we're scraping time-wise, and it's much more site-friendly. #72 should land soon.
Yup', we are much less flooding the websites now :)
Just curious, does this just continuously scrape all the sites pausing 100ms between each page, repeating again at the top?
I know the comment on https://github.com/sergiotapia/magnetissimo/blob/master/lib/crawler/thepiratebay.ex#L17 says 5 seconds, but it seems to be
1 * 1 * 100
== 100ms.Seems excessive to do this over and over without a longer break between a complete crawl.