Create ScrapedPage object

propublica / upton

A batteries-included framework for easy web-scraping. Just add CSS! (Or do more.)

MIT License

1.61k stars 112 forks source link

Which is what would be yielded out of Scraper#scrape instead of the HTML, the URL, and instance page's index, etc.

This ScrapedPage object -- which might inherit from Nokogiri::HTML -- would contain the raw HTML, the parsed HTML, the URL, the index page from which the instance page was linked (if present), a reference to the index page's ScrapedPage object, and the instance page's index (i.e. ordinal count) of pages linked to from the index page.

This would be a breaking change, so is farther away from being implemented into stable Upton.

propublica / upton

Create ScrapedPage object #32