rOpenGov / RPublica

ProPublica API Client
http://ropengov.github.io/RPublica/
22 stars 4 forks source link

Switch to jsonlite for better output #1

Closed jeroen closed 8 years ago

jeroen commented 10 years ago

A journalism student was trying to grab some data from propublica using RPublica and came to me for help. It seems that np_search function returns data as deeply nested lists, which are very difficult to handle to most novice R users. Some other packages on rOpenGov seem to suffer from similar problems.

The jsonlite package is designed specifically to solve this problem. For example try:

library(jsonlite)
fromJSON("http://projects.propublica.org/forensics/geos.json")

This returns a data frame, from which the user can immediately proceed to modeling and visualization. No advanced data manipulation skills are required. Combining pages is also quite easy using jsonlite and plyr. An example:

#requires jsonlite >= 0.9.9
library(jsonlite)
stopifnot(packageVersion("jsonlite") >= "0.9.9")

filings <- list()
for(i in 0:10){
  mydata <- fromJSON(paste0("http://projects.propublica.org/nonprofits/api/v1/search.json?order=revenue&sort_order=desc&page=", i), flatten=TRUE)
  message("Retrieving page ", mydata$cur_page, " of ", mydata$num_pages)
  if(mydata$cur_page == mydata$num_pages) 
    break;
  filings[[i+1]] <- mydata$filings
}

#combine all into one 
library(plyr)
alldata <- rbind.fill(filings)

#check output
nrow(alldata)
colnames(alldata)

Again, this simply returns a data frame with filings, ready to go straight to the good stuff.

It would be quite easy to rewrite the RPublica package to use jsonlite and plyr so that functions return data frames, rather than complex nested structures. I think this might make the package much more powerful and useful to a wider audience, such as my journalism student. Please let me know if you need any help with this, I'd be happy to assist.

leeper commented 10 years ago

Thanks for this. I've pushed an edit that switches to jsonlite. I like how that simplifies many of the response objects.

I don't think we need to add a plyr dependency since that change alone simplifies most of the response objects, but if you want to put together a pull request for a README example that includes some manipulation with plyr (like what you've shown here), I'd be happy to include it.

Thomas J. Leeper http://www.thomasleeper.com

On Thu, Jul 24, 2014 at 7:25 AM, Jeroen Ooms notifications@github.com wrote:

A journalism student was trying to grab some data from propublica using RPublica and came to me for help. It seems that np_search function returns data as deeply nested lists, which are very difficult to handle to most novice R users. Some other packages on rOpenGov seem to suffer from similar problems.

The jsonlite package is designed specifically to solve this problem. For example try:

library(jsonlite) fomJSON("http://projects.propublica.org/forensics/geos.json")

This returns a data frame, from which the user can immediately proceed to modeling and visualization. No advanced data manipulation skills are required. Combining pages is also quite easy using jsonlite and plyr. An example:

requires jsonlite >= 0.9.9

library(jsonlite)

filings <- list() for(i in 0:10){ mydata <- fromJSON(paste0("http://projects.propublica.org/nonprofits/api/v1/search.json?order=revenue&sort_order=desc&page=", i), flatten=TRUE) message("Retrieving page ", mydata$cur_page, " of ", mydata$num_pages) if(mydata$cur_page == mydata$num_pages) break; filings[[i+1]] <- mydata$filings }

combine all into one

library(plyr) alldata <- rbind.fill(filings)

check output

nrow(alldata) colnames(alldata)

Again, this simply returns a data frame with filings, ready to go straight to the good stuff.

It would be quite easy to rewrite the RPublica package to use jsonlite and plyr so that functions return data frames, rather than complex nested structures. I think this might make the package much more powerful and useful to a wider audience, such as my journalism student. Please let me know if you need any help with this, I'd be happy to assist.

— Reply to this email directly or view it on GitHub https://github.com/rOpenGov/RPublica/issues/1.

antagomir commented 10 years ago

Excellent, when the time allows I also need to check if we could utilize this better in other rOpenGov packages.

ouzor commented 10 years ago

Looks good indeed, thanks!

On Thu, Jul 24, 2014 at 11:47 AM, Leo Lahti notifications@github.com wrote:

Excellent, when the time allows I also need to check if we could utilize this better in other rOpenGov packages.

— Reply to this email directly or view it on GitHub https://github.com/rOpenGov/RPublica/issues/1#issuecomment-49982198.