propublica / upton

A batteries-included framework for easy web-scraping. Just add CSS! (Or do more.)
MIT License
1.62k stars 113 forks source link

problem scraping index page (Scraping 0 instances) #36

Open okliv opened 10 years ago

okliv commented 10 years ago

hi!

if i try to lookup this page alma-ata.alm.slando.kz for links h3.large a.link to go next

  scraper = Upton::Scraper.new('http://alma-ata.alm.slando.kz/','h3.large a.link')

all i get with scraper.verbose = true is

Stashing disabled. Will download from the internet.
Downloading from http://alma-ata.alm.slando.kz/ 
Downloaded http://alma-ata.alm.slando.kz/
sleeping 30 secs
Scraping 0 instances

but from js console on this page i see this

> $('h3.large a.link').size()
> 30

looks like an error somewhere

drwl commented 10 years ago

I'm using upton (0.3.3) and running it works for me.

Stashing disabled. Will download from the internet.
Downloading from http://alma-ata.alm.slando.kz/
Downloaded http://alma-ata.alm.slando.kz/
sleeping 1 secs
Scraping 30 instances
Stashing enabled. Will try reading http://alma-ata.alm.olx.kz/obyavlenie/ogromnyy-televizor-po-skromnoy-tsene-ID4fwdh.html#4c7502b736;promoted data from cache.
Cache of http://alma-ata.alm.olx.kz/obyavlenie/ogromnyy-televizor-po-skromnoy-tsene-ID4fwdh.html#4c7502b736;promoted unavailable. Will download from the internet...