Closed jeroen closed 8 years ago
Thanks for this. I've pushed an edit that switches to jsonlite. I like how that simplifies many of the response objects.
I don't think we need to add a plyr dependency since that change alone simplifies most of the response objects, but if you want to put together a pull request for a README example that includes some manipulation with plyr (like what you've shown here), I'd be happy to include it.
Thomas J. Leeper http://www.thomasleeper.com
On Thu, Jul 24, 2014 at 7:25 AM, Jeroen Ooms notifications@github.com wrote:
A journalism student was trying to grab some data from propublica using RPublica and came to me for help. It seems that np_search function returns data as deeply nested lists, which are very difficult to handle to most novice R users. Some other packages on rOpenGov seem to suffer from similar problems.
The jsonlite package is designed specifically to solve this problem. For example try:
library(jsonlite) fomJSON("http://projects.propublica.org/forensics/geos.json")
This returns a data frame, from which the user can immediately proceed to modeling and visualization. No advanced data manipulation skills are required. Combining pages is also quite easy using jsonlite and plyr. An example:
requires jsonlite >= 0.9.9
library(jsonlite)
filings <- list() for(i in 0:10){ mydata <- fromJSON(paste0("http://projects.propublica.org/nonprofits/api/v1/search.json?order=revenue&sort_order=desc&page=", i), flatten=TRUE) message("Retrieving page ", mydata$cur_page, " of ", mydata$num_pages) if(mydata$cur_page == mydata$num_pages) break; filings[[i+1]] <- mydata$filings }
combine all into one
library(plyr) alldata <- rbind.fill(filings)
check output
nrow(alldata) colnames(alldata)
Again, this simply returns a data frame with filings, ready to go straight to the good stuff.
It would be quite easy to rewrite the RPublica package to use jsonlite and plyr so that functions return data frames, rather than complex nested structures. I think this might make the package much more powerful and useful to a wider audience, such as my journalism student. Please let me know if you need any help with this, I'd be happy to assist.
— Reply to this email directly or view it on GitHub https://github.com/rOpenGov/RPublica/issues/1.
Excellent, when the time allows I also need to check if we could utilize this better in other rOpenGov packages.
Looks good indeed, thanks!
On Thu, Jul 24, 2014 at 11:47 AM, Leo Lahti notifications@github.com wrote:
Excellent, when the time allows I also need to check if we could utilize this better in other rOpenGov packages.
— Reply to this email directly or view it on GitHub https://github.com/rOpenGov/RPublica/issues/1#issuecomment-49982198.
A journalism student was trying to grab some data from propublica using
RPublica
and came to me for help. It seems thatnp_search
function returns data as deeply nested lists, which are very difficult to handle to most novice R users. Some other packages onrOpenGov
seem to suffer from similar problems.The
jsonlite
package is designed specifically to solve this problem. For example try:This returns a data frame, from which the user can immediately proceed to modeling and visualization. No advanced data manipulation skills are required. Combining pages is also quite easy using
jsonlite
andplyr
. An example:Again, this simply returns a data frame with filings, ready to go straight to the good stuff.
It would be quite easy to rewrite the
RPublica
package to usejsonlite
andplyr
so that functions return data frames, rather than complex nested structures. I think this might make the package much more powerful and useful to a wider audience, such as my journalism student. Please let me know if you need any help with this, I'd be happy to assist.