pull in html data with extra java reveal?

omegahat / XML

The XML package for R

Other

20 stars 11 forks source link

Hi Jo

I believe the data for for all 530 polls is not directly in a

in the HTML so you won't find it that way. Instead, that content is dynamically constructed using the data that is contained in a script node. The following is a specific way of doing it that could be generalized if necessary.

library(XML)

url = "http://projects.fivethirtyeight.com/2016-election-forecast/national-polls/"
doc <- htmlParse(url)

sc = getNodeSet(doc, "//script[contains(., 'race.model')]")
js = xmlValue(sc)

jsobj = gsub(".*race.stateData = (.*);race.pathPrefix.*", "\\1", js)

library(RJSONIO)

data = fromJSON(jsobj)
names(data)
length(data$polls) # The 534 we want.

omegahat / XML

pull in html data with extra java reveal? #8