pdxrlang / aggregate

aggregate - a pdxrlang subgroup meetup
10 stars 2 forks source link

Solution: Converting data frame to JSON #9

Open wjones127 opened 8 years ago

wjones127 commented 8 years ago

I found a working solution to Wes's problem of converting a dataframe into the JSON format he wanted.

The approach differs from Wes's mainly in that I just do the nesting of subquestions for every question, and later prune out the subquestions that have NA as their id. His approach was pretty close, but there is a bit of a frustrating limitation to the nest() function: it can't be applied twice in a row to the same data frame. You can see in the nest_subquestions function that I had to create separate data frames with the nested subquestions and options, and then merge them together; this couldn't be done in the same data frame without getting an error.

library(dplyr)
library(tidyr)
library(jsonlite)
library(purrr)

dummyDF <- data_frame(sectionId = c(rep(1,9),rep(2,3)),
                                            questionId = c(rep(1,3),rep(2,6),rep(3,3)),
                                            subquestionId = c(rep(NA,3),rep("2a",3),rep("2b",3),rep(NA,3)),
                                            deptManagerQId = c(rep("m1",3),rep("m2",3),rep("m3",3),rep("m4",3)),
                                            deptEmployeeQId = c(rep("e1",3),rep("e3",3),rep("e4",3),rep("e7",3)),
                                            optionId = rep(c(1,2,3),4),
                                            text = rep(c("yes","neutral","no"),4))

# Define nesting functions
# i.e. for each set of questions in a section...
nest_subquestions <- function(questions)  {
    with_subquestions <- questions %>% 
        group_by(questionId) %>%
        nest(subquestionId, deptManagerQId, deptEmployeeQId, .key=subquestions)  %>%
        mutate(subquestions = subquestions)

    with_options <- questions %>%
        group_by(questionId) %>%
        nest(optionId, text, .key=options)

    left_join(with_subquestions, with_options, by="questionId")
}

# Define pruning function
remove_null <- function(subquestions) {
    print(is.na(subquestions$subquestionId[1]))
    if (is.na(subquestions$subquestionId[1])) {
        subquestions <- NA
    }
}
# For each set of questions...
prune_subquestions <- function(questions) {
    questions %>%
        mutate(subquestions = ifelse(sapply(subquestions, has_null_id),
                                                                 NA,
                                                                 subquestions))
}
# For each question...
has_null_id <- function(subquestions) {
    is.na(subquestions$subquestionId[1])
}

list2 <- dummyDF %>% nest(-sectionId, .key=questions) %>% 
    mutate(questions = map(questions, nest_subquestions),
                 questions = map(questions, prune_subquestions)) %>%
    list(sections = .)

list2 %>% toJSON(pretty = T, na = c("null")) 
ismayc commented 8 years ago

Beautiful! Well done, @wjones127.

smithjd commented 8 years ago

I'm looking at your code @wjones127 , and wondering whether you found that using the map function from purr was essential to your solution or whether it was more of an experiment? Wes was using the more traditional seq_along...

wjones127 commented 8 years ago

Yeah I had purrr in there because I though I might need some of it's functionality, but I think in the final code all the map calls could be replaced with lapply.