Open algodave opened 9 years ago
Wow, I have to confess I hadn't foreseen so many issues when doing this with Capybara. Darn global variables...
Well, we seem to have 2 options:
Mapper::Redfin.map(
SherlockHomes::Scraper::Redfin.find(property_url)
)
and then instead of step 1 being scrape everything, and step 2 being to map everything, it'd be separate steps for each provider.
{
search_redfin: {success: search_trulia}},
search_trulia: { success: search_zillow}}
}
I think option 2 would mean less code, because you don't have to extract the data into another structure, but I'm ok with either approach
@jeffdeville option 2 was pretty straightforward, I just shared it
@jeffdeville @safeforge Let's discuss about the following.
SitePrism doesn't actually fetch any element value until that method is invoked (lazy approach). This prevents us from using our scrapers as they're currently defined, meaning: with SitePrism DSL only. It occurs in the Pipeline that Redfin scraper is invoked first, then Trulia scraper is invoked; when Redfin mapper tries to read an element's text (e.g.
basic_info.floors.text
) it founds the Trulia page in the Capybara session.What I suggest as a solution is making our scrapers stateful, meaning let's extract
text
s we need from Capybara elements right after the page is loaded.element
we should have aString
instance variable holding itstext
valuesection
, we should have aHash
instance variable holding itselement
s valuesattr_reader
to make them available to callersLooking forward for your feedback!