sathish316 / scrapify

ScrApify is a library to build APIs by scraping static sites and use data as models or JSON APIs. It powers APIfy which is used to create JSON APIs from any html or wikipedia page
http://apify.heroku.com/resources
143 stars 16 forks source link

find by id should crawl detailed content #21

Closed sathish316 closed 11 years ago

sathish316 commented 12 years ago

find by id returns same content as find all.

find by id should optionally crawl detailed pages and return more content

Example:

class IMDB
  include Scrapify::Base
  html "http://imdb.com/top250"
  attribute :rank, css: ".rank"
  attribute :title, css: ".title
  attribute :url, xpath: "//a[@class='movie']/@href"

  crawl using: url do |page|
    page.attribute :release_date, css: ".release_date"
    page.attribute :storyline, css: ".storyline"
    page.attribute :full_cast, css: ".full_cast" {|e| e.children.css('.name').map(&:value)}
  end
end