postmodern / spidr

A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.
MIT License
800 stars 109 forks source link

Crawling a specific page #46

Closed justaj closed 8 years ago

justaj commented 8 years ago

Hey! I was wondering if there was a way to return all links found on a specific page. So far spidr has been great for crawling a whole site but with my testing I'd like to be able to focus on one page.

Thanks

postmodern commented 8 years ago

You could use every_page { |page| ... } and filter if page.url.path == '...' for the page you are looking for? Also, you could use visit_urls_like to only visit that specific page.

postmodern commented 8 years ago

Oh, you could also manually request the page via net-http, mechanize, RestClient, etc and manually create a Spidr::Page object.