Open jeremybmerrill opened 10 years ago
Implemented in future (for 1.0.0) in https://github.com/propublica/upton/commit/31cbf413583816c138f9228eed3688333096cd9b
Will be minimally breaking, since missing methods on Page are passed through to Nokogiri::HTML.
Maybe I should implement this even-less-breakingly in 0.4.0 by still passing the instance_index, instance_url, etc. attrs through to blk.call
?
Which is what would be yielded out of
Scraper#scrape
instead of the HTML, the URL, and instance page's index, etc.This ScrapedPage object -- which might inherit from Nokogiri::HTML -- would contain the raw HTML, the parsed HTML, the URL, the index page from which the instance page was linked (if present), a reference to the index page's ScrapedPage object, and the instance page's index (i.e. ordinal count) of pages linked to from the index page.
This would be a breaking change, so is farther away from being implemented into stable Upton.