Closed notesofdabbler closed 9 years ago
Maybe you need html_nodes? html_node only returns a single node.
On Sunday, October 12, 2014, notesofdabbler notifications@github.com wrote:
I am not sure what I am doing wrong but when I scrape a page from realtor.com, I am not getting the full list with rvest. I have listed my R code below.
#
Scrape data on house listings from realtor.com
#
set working directory
setwd("~/notesofdabbler/Rspace/dayoh_housing/")
load libraries
library(rvest) library(XML)
#
Search URL with following filter applied
3+ bedrooms, 2+ baths, 1800+ sqft, 0-20 years old
using XML library
housedoc=htmlTreeParse(srchurl,useInternalNodes=TRUE) ns_id=getNodeSet(housedoc,"//ul[@class='listing-summary']//li[@class='listing-location']//a[@href]") id=sapply(ns_id,function(x) xmlAttrs(x)["href"]) id
using rvest library
housedoc = html(srchurl) houselist = housedoc %>% html_node(".listing-summary") id = houselist %>% html_node(".listing-location a") %>% html_attr("href") id
The actual run version of the code with output is here http://notesofdabbler.github.io/other/scrape_housingdata.html.
— Reply to this email directly or view it on GitHub https://github.com/hadley/rvest/issues/21.
Thanks very much. changing to html_nodes fixed it. Just for reference I have put the corrected version of the code here
I am not sure what I am doing wrong but when I scrape a page from realtor.com, I am not getting the full list with rvest. I have listed my R code below.
The actual run version of the code with output is here.