propublica / upton

A batteries-included framework for easy web-scraping. Just add CSS! (Or do more.)
MIT License
1.62k stars 113 forks source link

pagination doesn't respect sleep time #28

Closed jeremybmerrill closed 10 years ago

jeremybmerrill commented 10 years ago

for some reason.

It should.

esagara commented 10 years ago

Ah so I didn't realize I was posting in the wrong area. We are coming across this issue as well. From what I have been reading in the code, it looks like the sleep time will only work if stashing is disabled. That is the only time I can find resp_and_cache[:from_resource] being set to true as required here. At least that is my take from what I am reading in the get method. Would it make sense to write the sleep into the download_from_resource! method?

jeremybmerrill commented 10 years ago

Lol no worries, it's whatever.

jeremybmerrill commented 10 years ago

Ah figured it out

fix tk shorlty

jeremybmerrill commented 10 years ago

https://github.com/propublica/upton/commit/bccd0192fa51e6272255dc342b2d13bdba10683c#diff-9347901060f5e801766116fbb544a20dL80

The boolean here was flipped; literally telling it not to sleep when it should have told it to sleep. Silly Jeremy.

esagara commented 10 years ago

Hey, I am running an array of urls through the scraper with no pagination. Sleep time is being ignored. I am working on subclassing some stuff to work around this, but I thought you should know.

jeremybmerrill commented 10 years ago

Hmm. Can you make a gist or something so I can test it?

Also, can you make sure you're pulling the latest master from github? I didn't push an updated gem yet.

esagara commented 10 years ago

I pulled the master after you updated. Will send you a gist shortly.

On Thu, Dec 19, 2013 at 11:38 AM, Jeremy B. Merrill < notifications@github.com> wrote:

Hmm. Can you make a gist or something so I can test it?

Also, can you make sure you're pulling the latest master from github? I didn't push an updated gem yet.

— Reply to this email directly or view it on GitHubhttps://github.com/propublica/upton/issues/28#issuecomment-30943796 .