mattroumaya / surveymonkey

Access your SurveyMonkey data directly from R!
https://mattroumaya.github.io/surveymonkey/
Other
42 stars 10 forks source link

Extracting Page ID #105

Closed wnfaulkner closed 2 years ago

wnfaulkner commented 2 years ago

Hi @mattroumaya ,

Back at it again! Thank you for your help a few months ago. We were able to get data parsed, organized, and sent off to some interested parties who ended up using it to put pressure on some local courts in Louisiana to get their act together with regards to providing interpreters.

New issue:

Notes:

  1. I have spent a couple hours on this, but wouldn't say I'm at the end of the road yet, so unless you have a good idea of how to address this efficiently on you're end, feel free to ignore.
  2. I have no experience dealing with JSON in [R] and the initial posts from Googling are not too helpful.

Goal: My client's surveys contain data from observing court proceedings. Observers record data at the court session level (e.g. judge who presided, date of session, etc.) and in many cases data at the individual level as well (e.g. for each defendant - were they represented by an attorney, how much bail/bond did the judge assign, etc.). For the individual-level data, the data does not come with an id for each individual, so I to create one. The data for each individual, however, is recorded on a single survey 'page,' so my immediate goal is to be able to extract some sort of page.id when parsing responses.

Question: From digging into your functions, I can tell there is a lot of nesting going on and at some level I can print page ids:

for(i in 1:survey.json.i$page_count){ print(survey.json.i$pages[[i]]$id) }

Where in the dependencies for parse_survey would you add some code to extract the page.id alongside responses?

Again, feel free to ignore this if you don't have a good sense of the answer already - I can and will keep digging!

Thanks!

mattroumaya commented 2 years ago

Great to hear some background on your projects, @wnfaulkner!

I don't think I totally understand the end goal here, and apologies if I'm missing something obvious.

It seems like this is the API endpoint you're looking to grab: https://developer.surveymonkey.com/api/v3/#api-endpoints-get-surveys-id-pages

Is there any way you could mock up a table to show what the data.frame would ideally look like? I think this should be a fairly straightforward addition, but just want to make sure I understand exactly what you're looking for.

wnfaulkner commented 2 years ago

Hi @mattroumaya,

OK, going to try to clarify below, but again, I need to do a little more learning about how to manipulate & parse json objects inside of [R], so if things aren't clear we can just leave it for now.

FYI I prefer data in long format so I'm using parse_respondent_list(), as opposed to the wide format output by parse_survey().

Here's example data without page.id: responses.example.noid.df <- data.frame( survey.id = rep("a",10), respondent.id = c(1,1,1,1,1,2,2,2,2,2), variable = c("judge.name","defendant.name","defendant.bail.amount","defendant.name","defendant.bail.amount","judge.name","defendant.name","defendant.bail.amount","defendant.name","defendant.bail.amount"), value = c("shigart","george",1000,NA,2000,"williams","calvin",NA,NA,2000) )

The goal is to extract the $page$id value from the json object created by fetch_survey_object() so that my output will have a page.id variable (from which I can then form a defendant.id variable):

responses.example.withid.df <- data.frame( survey.id = rep("a",10), respondent.id = c(1,1,1,1,1,2,2,2,2,2), variable = c("judge.name","defendant.name","defendant.bail.amount","defendant.name","defendant.bail.amount","judge.name","defendant.name","defendant.bail.amount","defendant.name","defendant.bail.amount"), value = c("shigart","george",1000,NA,2000,"williams","calvin",NA,NA,2000), page.id = rep(c(1,2,2,3,3),2) )

Here the page.id variable indicates that page 1 asks the judge name for the entire session observed, and then there are two pages to record key info on each defendant (there are obviously many more of these 'by-defendant' pages in the actual survey). I've purposefully made the data messy as it often comes to us.

I could use the repeating patterns of variable names to split things into pages because I know which variables go with which pages, but this seems like a clumsy and brittle approach given that the variable names are different and occur in different orders across survey versions.

Also, because the actual data is pretty big, it helps my cleaning code to run if I remove all blank responses, further complicating the interpretation when there are certain patterns of blank responses:

responses.example.noid.df[!is.na(responses.example.noid.df$value),]

Here you can see that without the page.id, it seems impossible to tell that the final bail amount ($2000) was for an unnamed 2nd defendant, not the first defendant 'calvin'. Once again, thanks for your time and consideration!!

mattroumaya commented 2 years ago

Thanks for the additional detail, @wnfaulkner!! It sounds like you're doing very meaningful work so I'm really happy to help with this (if I can! 😃)

Below is a function that I hope is close. It'll give you a bunch of extra columns, but you can just select the ones you need. Let me know if this helps or if it's off the mark.

You would run this by doing survey <- sm_with_id(123456789)


sm_with_id <- function(id){

responses <- surveymonkey:::get_responses(id) %>% 
  surveymonkey:::parse_respondent_list()

questions <- id %>% 
  surveymonkey::fetch_survey_obj() %>% 
  surveymonkey:::parse_all_questions()

purrr::map(survey$pages, function(x){
  position <- x[['position']]
  ids <- purrr::map(survey$pages[[position]][['questions']], ~.x[['id']])
  dplyr::bind_cols(position, ids)
}) %>% 
  dplyr::bind_rows() %>% 
  dplyr::rename(page = 1) %>% 
  tidyr::pivot_longer(-page) %>% 
  dplyr::select(page, question_id = value) %>% 
  na.omit() %>% 
  dplyr::left_join(responses) %>% 
  dplyr::left_join(questions)
}
mattroumaya commented 2 years ago

Any luck with the function above @wnfaulkner?

wnfaulkner commented 2 years ago

Hi @mattroumaya ,

I'm so sorry for the delayed reply. Since our last communication, I've switched computers, jobs, and been sick. I'm still working with my client but it may take me a little while to be able to test fully. When I gave it a quick try last week, things were looking good, but let me put some time in this week and I will give you a more thorough reply. Once again, thanks for the help and the patience!

mattroumaya commented 2 years ago

@wnfaulkner no need to apologize! Hope you are feeling better, and best of luck with your new job! I'll keep this open until you have a chance to review, definitely no rush at all.

wnfaulkner commented 2 years ago

Hi @mattroumaya,

Quick update: the function works. With a little tweaking, I've been able to use it to get a table of question.ids and page.ids that I can join with my responses table to create a unique defendant.id for each defendant.

I'm working on expanding now to be able to apply/map the function across multiple surveys. The function does take some time to execute, but I technically only need to run it once for each survey and then can store the result in a config/aux table in a google sheet.

I may have more questions as I go, but I think you can close this issue for now! I'd be happy to send you a copy of the report (on New Orleans courts) once it's finished if you'd like? Just let me know where to send it!

And of course, HUGE thanks once again for the help!!

Best, @wnfaulkner

mattroumaya commented 2 years ago

Excellent, @wnfaulkner! Thanks for the update!

I'd definitely love to see what you're working on - feel free to send a copy of the report to me at matthewroumaya@gmail.com.

Closing this for now but as always, please feel free to open an issue if anything comes up.