mattroumaya / surveymonkey

Access your SurveyMonkey data directly from R!
https://mattroumaya.github.io/surveymonkey/
Other
42 stars 10 forks source link

Add support for star ranking question type #57

Closed sfirke closed 2 years ago

sfirke commented 4 years ago

How does that one currently work?

paulditterline commented 2 years ago

When I pull in a survey that has a star rating question type, the response is coded as a factor with one level, which is "" - or blank.

In other words, star questions are presented as completely blank factor columns.

mattroumaya commented 2 years ago

Hi @paulditterline, thanks for the info! Are you needing support for star rating questions? I can take a look at this sometime this week or weekend.

paulditterline commented 2 years ago

I'd love that. I'd be happy to try and help as well if you can point me towards the right function.

On Wed, Dec 1, 2021 at 11:03 AM Matt Roumaya @.***> wrote:

Hi @paulditterline https://github.com/paulditterline, thanks for the info! Are you needing support for star rating questions? I can take a look at this sometime this week or weekend.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tntp/surveymonkey/issues/57#issuecomment-983787011, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACMR6UYDTCWH4BZHSVBDZFDUOZBLRANCNFSM4LOR6INQ .

mattroumaya commented 2 years ago

@paulditterline I just merged a PR that will hopefully support star rating questions. The issue was in the assemble.r function (which is used for parse_survey). My assumption is that your rating question did not use row labels, but I'm hoping you are able to test it out and confirm if this is now supported or if we need to dig into it more.

paulditterline commented 2 years ago

Hi @mattroumaya - thanks for the quick update, very much appreciate your help and time.

I re-installed the package using devtools::install_github("tntp/surveymonkey"), then loaded the package and called in my survey with fetch_survey_object() and then parse_survey(). The two star questions are still showing up as all blank in the data frame, while all other data is present.

mattroumaya commented 2 years ago

@paulditterline ah! sorry it's not working!

Is there any way you could provide some setup info about how your rating question is programmed in SurveyMonkey? I would follow your directions and create a similar item in my account, create some fake responses, and then pull the results which would help to replicate and identify the issue.

paulditterline commented 2 years ago

Hi @mattroumaya - the question type is called "star rating", as seen in this screenshot:

image

Not sure if it matters, but we have it set on the "star" icon, as see in this screenshot:

image

Anything else I can do to help?

mattroumaya commented 2 years ago

I have a test question set up as below, and am able to pull the results with:

123456789 %>% 
  fetch_survey_obj() %>% 
  parse_survey()

In the case of the screenshot below, the column is a factor with 5 levels: 1,2,3,4, and 5. Are there any other options you've changed for the rating type, or is it identical to the screenshot below?

Sorry for all of the questions! It's challenging to troubleshoot when we're working within the SurveyMonkey UI and not able to easily share exact specifications.

image

paulditterline commented 2 years ago

I don't see any differences in mine:

image

Mine doesn't show the "scale" options - but I think that's because I already have responses so I can't edit that field.

I double checked that I was using the right survey ID number as well.

The fields show up blank, as you can see in my viewer screenshot:

image

while other fields come through as normal:

image

And the responses to these questions do show up in survey monkey:

image

And when I manually download responses as a .csv or excel, the responses are there as well.

Reinstalling the package should have given me the updated function, right? Not sure what else could be going on 🤔

mattroumaya commented 2 years ago

Reinstalling definitely should have given you the updated function. Here are two suggestions that will hopefully get you up and running:

  1. Try to reinstall again:
devtools::install_github('tntp/surveymonkey')
library(surveymonkey)
  1. Replace parse_survey() with a function that you run and store in your Global Environment called parse_survey2() (below). This is just making the updated parse_survey() function available for you to use, so you can ensure that you're running the updated function. In this case, you could test:
123456789 %>% 
  fetch_survey_obj() %>% 
  parse_survey()

Copy and paste everything below into your console or a separate R script to run

parse_survey2 <- function(surv_obj, oauth_token = getOption('sm_oauth_token'), ...){
  if(surv_obj$response_count == 0){
    warning("No responses were returned for this survey.  Has anyone responded yet?")
    return(data.frame(survey_id = as.numeric(surv_obj$id)))
  }

  respondents <- get_responses(surv_obj$id, oauth_token = oauth_token, ...)

  # Save response status to join later
  vals <- c("id", "response_status")
  response_status_list <- lapply(respondents, "[", vals)
  status <- do.call(rbind.data.frame, response_status_list)

  responses <- respondents %>%
    surveymonkey:::parse_respondent_list()

  question_combos <- surveymonkey:::parse_all_questions(surv_obj)

  # this join order matters - putting q_combos on left yields the right ordering of columns in final result
  # the joining variables vary depending on question types present, so can't hard-code. Thus squash message
  x <- suppressMessages(dplyr::full_join(question_combos, responses))

  # There should not be duplicate rows here, but putting this here in case of oddities like #27
  assertthat::assert_that(sum(duplicated(dplyr::select_if(x, is.atomic))) == 0,
                          msg = paste0("There are duplicated rows in the responses, maybe a situation like #27 - ", file_bug_report_msg()))

  # questions with only simple answer types might not have some referenced columns, #46
  add_if_not_present <- c(choice_id = NA_character_, choice_position = NA_integer_)
  x <- x %>%
    tibble::add_column(!!!add_if_not_present[!names(add_if_not_present) %in% names(.)])

  # 'type' and 'required' are created when question_type == 'demographic'
  # Drop them because it causes issues with duplicated rows per respondent_id
  # Reference Issue #27, Issue #62
  x$type <- NULL
  x$required <- NULL

  # Issue #73 - API added choice_metadata for certain question types.
  # Need to investigate further, but as of 11/2021, the addition is preventing parse_survey() from working.
  x$choice_metadata <- NULL

  # Issue #57 - Star Ranking
  # - If rating labels are not used, choice_text appears blank.
  # - Need to recode so that choice_text is choice_position
  x <- x %>%
    mutate(choice_text = case_when(choice_text == "" & question_type == "matrix" & question_subtype == "rating" ~ as.character(choice_position),
                                   TRUE ~ choice_text))

  #If question type = Multiple Choice, include choice text + ID in the combined new columns

  x$q_unique_id <- apply(
    x %>%
      dplyr::select(question_id, row_id, col_id, other_id),
    1,
    function(x) paste(stats::na.omit(x), collapse="_")
  )
  x$q_unique_id[x$question_type == "multiple_choice" | x$question_subtype == "multi" & is.na(x$other_id)] <- paste(
    x$q_unique_id[x$question_type == "multiple_choice" | x$question_subtype == "multi" & is.na(x$other_id)],
    x$choice_id[x$question_type == "multiple_choice" | x$question_subtype == "multi" & is.na(x$other_id)],
    sep = "_")

  x$combined_q_heading <- apply(
    x %>%
      dplyr::select(heading, row_text, col_text, other_text),
    1,
    function(x) paste(stats::na.omit(x), collapse= " - ")
  )

  x <- x %>%
    dplyr::mutate(combined_q_heading = dplyr::case_when(question_type == "multiple_choice" & is.na(other_text) ~ paste(combined_q_heading, choice_text, sep = " - "),
                                                        question_type != "open_ended" & question_subtype == "multi" & is.na(other_text) ~ paste(combined_q_heading, choice_text, sep = " - "),
                                                        TRUE ~ combined_q_heading))

  # combine open-response text and choice text into a single field to populate the eventual table
  x$answer <- dplyr::coalesce(x$response_text, x$choice_text)
  assertthat::assert_that(sum(!is.na(x$answer)) == (sum(!is.na(x$response_text)) + sum(!is.na(x$choice_text))),
                          msg = paste0("Uh oh, we failed to account for a combination of open-response text - ", file_bug_report_msg()))
  static_vars <- setdiff(names(x), c("heading", "question_id", "question_type", "question_subtype",
                                     "choice_position", "choice_text", "quiz_options", "choice_id",
                                     "other_id", "other_text", "row_text", "row_id", "description",
                                     "col_text", "response_text", "col_id", "q_unique_id",
                                     "combined_q_heading", "answer"))

  final_x <- x %>%
    dplyr::select(tidyselect::all_of(static_vars), combined_q_heading, answer, q_unique_id)

  qid_text_crosswalk <- final_x %>%
    dplyr::distinct(q_unique_id, combined_q_heading) %>%
    dplyr::mutate(unique_text = de_duplicate_names(combined_q_heading))

  # did a full_join above to make sure that all questions [q_unique_ids] are present in result even if no one answered them
  # but that means the spread will fail b/c there's more than one response per q_unique_id for response_id == NA
  # Adjust for that to spread, then filter that out after spread
  final_x_real <- final_x %>%
    dplyr::filter(!is.na(response_id))

  final_x_dummy <- final_x %>%
    dplyr::filter(is.na(response_id)) %>%
    dplyr::distinct(q_unique_id)

  final_x <- dplyr::bind_rows(final_x_real, final_x_dummy)

  # spread wide
  # get column order to reset to after spread makes alphabetical
  col_names <- c(names(final_x)[!(names(final_x) %in% c("combined_q_heading","answer", "q_unique_id"))], qid_text_crosswalk$unique_text)

  out <- final_x %>%
    dplyr::select(-combined_q_heading) %>%
    dplyr::mutate(q_unique_id = factor(q_unique_id, levels = qid_text_crosswalk$q_unique_id)) %>% # to spread unrepresented levels
    tidyr::pivot_wider(names_from = q_unique_id, values_from = answer) %>%
    dplyr::filter(!is.na(response_id))

  # Takes spread-out results data.frame and turns multiple choice cols into factors.  GH issue #12
  # Doing this within the main function so it can see crosswalk

  master_qs <- x %>%
    dplyr::distinct(q_unique_id, choice_id, question_id, choice_position, choice_text)

  # set a vector as a factor, if it has answer choices associated with its question id
  set_factor_levels <- function(vec, q_id){

    # fetch possible answer choices given a question's text
    get_factor_levels <- function(q_id){
      master_qs %>%
        dplyr::filter(q_unique_id == q_id, !is.na(choice_id)) %>%
        dplyr::arrange(choice_position) %>% # appears to always come from API in order but don't want to assume
        dplyr::pull(choice_text) %>%
        unique() # in case they loaded the same value twice as answer choices, #48
    }

    name_set <- get_factor_levels(q_id)
    if(length(name_set) == 0){
      return(vec)
    } else {
      factor(vec, levels = name_set)
    }
  }
  out <- purrr::map2_dfc(out, names(out), set_factor_levels)

  # reset to text names instead of numbers
  # and then re-order to correct columns
  names(out)[(length(static_vars) + 1):length(names(out))] <- qid_text_crosswalk$unique_text[match(names(out)[(length(static_vars) + 1):length(names(out))],qid_text_crosswalk$q_unique_id)]
  out <- out[, col_names]
  out <- out %>%
    dplyr::arrange(dplyr::desc(response_id)) %>%
    dplyr::rename(respondent_id = response_id)

  # Join response status
  out <- out %>%
    dplyr::left_join(.,status, by = c("respondent_id" = "id")) %>%
    dplyr::select(survey_id, collector_id, respondent_id, date_created, date_modified, response_status, everything())
  out
}

# Helper function for de-duplicating identical Q names
# Input: the vector of names
# Adapted from janitor::make_clean_names()
de_duplicate_names <- function(x){
  dupe_count <- vapply(seq_along(x), function(i) {
    sum(x[i] == x[1:i])
  }, integer(1))
  x[dupe_count > 1] <- paste(x[dupe_count >
                                 1], dupe_count[dupe_count > 1], sep = "_")
  x
}
paulditterline commented 2 years ago

Was trying this in a fresh session and got this error after running parse_survey():

image

Now I'm wondering if my error is package / load order related? Error is not present once I library(tidyverse).

paulditterline commented 2 years ago

OK - not exactly sure what happened, but the star questions are now coming in!

mattroumaya commented 2 years ago

Awesome! I believe the issue above should be fixed if I reference dplyr (dplyr::mutate() instead of just mutate). I'll fix that up now, and hopefully star ranking question types are fully supported.

Thank you for all of your assistance troubleshooting this!

paulditterline commented 2 years ago

My pleasure - thank you for all your work on this package. It's incredibly helpful. Cheers!

mattroumaya commented 2 years ago

Merged PR #85 to close this issue.