Closed kevintroy closed 3 years ago
I just tried out various configurations with breakout_sets = FALSE
and I can't seem to find this problem with my example surveys. Can you add a bit more detail and maybe an example if possible, so we can see if we can handle this better?
It occurred to me that this might vary by question type -- the question in the data set I was looking at originally was a Multiple Answer Grid, which is not quite the same as a Multiple Choice Multiple Response question.
So, I've created a test survey that uses every Qualtrics question type that should be impacted by BREAKOUT_SETS. I'll populate this with some data and post examples of what I'm seeing in a few days.
I've delved into this further and it's because of this combination of things:
breakout_sets = FALSE
and label = FALSE
, the Qualtrics API will return comma-delimited strings of numbers in a single column (for example, if the first, second, and third options are checked off, the API will return "1,2,3").readr::read_csv
ignores these commas, for what I'm sure are reasonsWe can see the readr
behavior without making any API calls:
library(tidyverse)
test_frame <- tibble(numeric = c(1, 2, 3),
comma_delimited = c("1,2,3", "3,2,1", "2,1,3"),
semi_delimited = c("1;2;3", "3;2;1", "2;1;3"),
pipe_delimited = c("1|2|3", "3|2|1", "2|1|3"))
test_frame %>%
write_csv("test.csv")
read_csv("test.csv")
#>
#> -- Column specification --------------------------------------------------------
#> cols(
#> numeric = col_double(),
#> comma_delimited = col_number(),
#> semi_delimited = col_character(),
#> pipe_delimited = col_character()
#> )
#> # A tibble: 3 x 4
#> numeric comma_delimited semi_delimited pipe_delimited
#> <dbl> <dbl> <chr> <chr>
#> 1 1 123 1;2;3 1|2|3
#> 2 2 321 3;2;1 3|2|1
#> 3 3 213 2;1;3 2|1|3
``
Created on 2021-04-12 by the reprex package (v1.0.0)
Note that the comma_delimited
column is read in as a double.
One possible "fix" for this would be for fetch_survey
to generate a warning when both breakout_sets = F
and label = F
? The user would then be able to use a col_types
specification to make sure the delimited column is read in correctly.
Oh boy, separating those values by COMMAS in a COMMA-SEPARATED file 😩
Am I correct in understanding that it isn't all combinations of breakout_sets = FALSE
and label = FALSE
that are problematic, but just for certain question types? Or is it all question types for this combination?
I checked all the question types where breakout_sets
is relevant, and the format/behavior is consistent across them all.
Here is the new warning for when folks use both breakout_sets = FALSE
together with label = FALSE
:
library(qualtRics)
fetch_survey("SV_5BJRo2RGHajIlOB",
label = FALSE,
breakout_sets = FALSE,
convert = FALSE,
force_request = TRUE)
#> Warning: Use caution with `breakout_sets = FALSE` plus `label = FALSE`
#> * Results will likely be incorrectly guessed and read in as numeric
#> * Use a `col_types` specification to override
#> | | | 0% | |========================================================= | 82% | |======================================================================| 100%
#>
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#> .default = col_double(),
#> StartDate = col_datetime(format = ""),
#> EndDate = col_datetime(format = ""),
#> IPAddress = col_character(),
#> RecordedDate = col_datetime(format = ""),
#> ResponseId = col_character(),
#> RecipientLastName = col_logical(),
#> RecipientFirstName = col_logical(),
#> RecipientEmail = col_logical(),
#> ExternalReference = col_logical(),
#> DistributionChannel = col_character(),
#> UserLanguage = col_character(),
#> Q1_DO = col_character(),
#> FL_6_DO = col_character()
#> )
#> ℹ Use `spec()` for the full column specifications.
#> # A tibble: 122 x 38
#> StartDate EndDate Status IPAddress Progress
#> <dttm> <dttm> <dbl> <chr> <dbl>
#> 1 2020-03-29 20:47:24 2020-03-29 20:48:23 1 <NA> 100
#> 2 2020-03-29 20:50:02 2020-03-29 20:50:02 2 <NA> 100
#> 3 2020-03-29 20:50:02 2020-03-29 20:50:02 2 <NA> 100
#> 4 2020-03-29 20:50:02 2020-03-29 20:50:02 2 <NA> 100
#> 5 2020-03-29 20:50:03 2020-03-29 20:50:03 2 <NA> 100
#> 6 2020-03-29 20:50:03 2020-03-29 20:50:03 2 <NA> 100
#> 7 2020-03-29 20:50:03 2020-03-29 20:50:03 2 <NA> 100
#> 8 2020-03-29 20:50:03 2020-03-29 20:50:03 2 <NA> 100
#> 9 2020-03-29 20:50:03 2020-03-29 20:50:03 2 <NA> 100
#> 10 2020-03-29 20:50:03 2020-03-29 20:50:03 2 <NA> 100
#> # … with 112 more rows, and 33 more variables: Duration (in seconds) <dbl>,
#> # Finished <dbl>, RecordedDate <dttm>, ResponseId <chr>,
#> # RecipientLastName <lgl>, RecipientFirstName <lgl>, RecipientEmail <lgl>,
#> # ExternalReference <lgl>, LocationLatitude <dbl>, LocationLongitude <dbl>,
#> # DistributionChannel <chr>, UserLanguage <chr>, Q1002 <dbl>, Q1006 <dbl>,
#> # Q1007 <dbl>, Q1_1 <dbl>, Q1_2 <dbl>, Q1_3 <dbl>, Q1_4 <dbl>, Q1_5 <dbl>,
#> # Q1_DO <chr>, Q200 <dbl>, Q300 <dbl>, Q201 <dbl>, Q301 <dbl>, Q202 <dbl>,
#> # Q302 <dbl>, Q203 <dbl>, Q303 <dbl>, Q204 <dbl>, Q304 <dbl>,
#> # SolutionRevision <dbl>, FL_6_DO <chr>
Created on 2021-04-21 by the reprex package (v2.0.0)
I don't often fetch_survey(breakout_sets = FALSE), but when I do the multiple response questions are downloading as "pure" strings of numbers without delimiters, which then get read into the tibble as numeric data. For example, if both the second and fifth options are checked, the data will show the number "25."
By contrast, the Qualtrics GUI "export data" will return a CSV with comma-delimited strings, which is what I was expecting. Similarly, any "Display Order" variables pulled down by fetch_survey(breakout_sets=FALSE) come down as pipe-delimited strings of numbers, e.g. ("3|1|2").
I have a feeling this is (another) case of inconsistent Qualtrics APIs, but thought I'd bring it up. Possibly related: https://github.com/ropensci/qualtRics/issues/144