ropensci / qualtRics

Download ⬇️ Qualtrics survey data directly into R!
https://docs.ropensci.org/qualtRics
Other
215 stars 70 forks source link

fetch_survey(include_display_order = TRUE) does not retrieve display order #151

Closed nthun closed 4 years ago

nthun commented 4 years ago

I tried to use fetch_survey() to download survey responses with randomized blocks. When I download the data manually, the display order can be found in the dataset. There is a parameter for fetch_survey(), that is supposed to control if the display order is retrieved include_display_order = TRUE but it does not seem to work. I cannot give you a reprex as it would require my API key, but this should show what is the problem.

with_do <- qualtRics::fetch_survey(surveyID = "[my survey]",
                        include_display_order = TRUE,
                        force_request = TRUE)

without_do <- qualtRics::fetch_survey(surveyID = "[my survey]",
                                      include_display_order = FALSE,
                                      force_request = TRUE)

identical(with_do, without_do)
[1] TRUE

I wonder if this is a bug or am I missing something?

juliasilge commented 4 years ago

I believe this is working correctly; I am not seeing a problem with surveys I have access to. Check out the results for this example survey, that has randomization in some of the questions:

library(qualtRics)

with_do <- fetch_survey("SV_XXXXX",
                        include_display_order = TRUE,
                        force_request = TRUE)
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#> Parsed with column specification:
#> cols(
#>   .default = col_character(),
#>   StartDate = col_datetime(format = ""),
#>   EndDate = col_datetime(format = ""),
#>   IPAddress = col_logical(),
#>   Progress = col_double(),
#>   `Duration (in seconds)` = col_double(),
#>   Finished = col_logical(),
#>   RecordedDate = col_datetime(format = ""),
#>   RecipientLastName = col_logical(),
#>   RecipientFirstName = col_logical(),
#>   RecipientEmail = col_logical(),
#>   ExternalReference = col_logical(),
#>   LocationLatitude = col_double(),
#>   LocationLongitude = col_double(),
#>   Q1007 = col_double(),
#>   Q1_DO_1 = col_double(),
#>   Q1_DO_2 = col_double(),
#>   Q1_DO_3 = col_double(),
#>   Q1_DO_4 = col_double(),
#>   Q1_DO_5 = col_double(),
#>   SolutionRevision = col_double()
#>   # ... with 5 more columns
#> )
#> See spec(...) for full column specifications.

with_do
#> # A tibble: 121 x 46
#>    StartDate           EndDate             Status IPAddress Progress
#>    <dttm>              <dttm>              <chr>  <lgl>        <dbl>
#>  1 2020-03-29 20:47:24 2020-03-29 20:48:23 Surve… NA             100
#>  2 2020-03-29 20:50:02 2020-03-29 20:50:02 Surve… NA             100
#>  3 2020-03-29 20:50:02 2020-03-29 20:50:02 Surve… NA             100
#>  4 2020-03-29 20:50:02 2020-03-29 20:50:02 Surve… NA             100
#>  5 2020-03-29 20:50:03 2020-03-29 20:50:03 Surve… NA             100
#>  6 2020-03-29 20:50:03 2020-03-29 20:50:03 Surve… NA             100
#>  7 2020-03-29 20:50:03 2020-03-29 20:50:03 Surve… NA             100
#>  8 2020-03-29 20:50:03 2020-03-29 20:50:03 Surve… NA             100
#>  9 2020-03-29 20:50:03 2020-03-29 20:50:03 Surve… NA             100
#> 10 2020-03-29 20:50:03 2020-03-29 20:50:03 Surve… NA             100
#> # … with 111 more rows, and 41 more variables: `Duration (in seconds)` <dbl>,
#> #   Finished <lgl>, RecordedDate <dttm>, ResponseId <chr>,
#> #   RecipientLastName <lgl>, RecipientFirstName <lgl>, RecipientEmail <lgl>,
#> #   ExternalReference <lgl>, LocationLatitude <dbl>, LocationLongitude <dbl>,
#> #   DistributionChannel <chr>, UserLanguage <chr>, Q1002 <ord>, Q1006 <ord>,
#> #   Q1007 <dbl>, Q1_1 <chr>, Q1_2 <chr>, Q1_3 <chr>, Q1_4 <chr>, Q1_5 <chr>,
#> #   Q1_DO_1 <dbl>, Q1_DO_2 <dbl>, Q1_DO_3 <dbl>, Q1_DO_4 <dbl>, Q1_DO_5 <dbl>,
#> #   Q200 <ord>, Q300 <ord>, Q201 <ord>, Q301 <ord>, Q202 <ord>, Q302 <ord>,
#> #   Q203 <ord>, Q303 <ord>, Q204 <ord>, Q304 <ord>, SolutionRevision <dbl>,
#> #   FL_6_DO_FL_7 <dbl>, FL_6_DO_FL_8 <dbl>, FL_6_DO_FL_9 <dbl>,
#> #   FL_6_DO_FL_10 <dbl>, FL_6_DO_FL_11 <dbl>

without_do <- fetch_survey("SV_XXXXX",
                           include_display_order = FALSE,
                           force_request = TRUE)
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#> Parsed with column specification:
#> cols(
#>   .default = col_character(),
#>   StartDate = col_datetime(format = ""),
#>   EndDate = col_datetime(format = ""),
#>   IPAddress = col_logical(),
#>   Progress = col_double(),
#>   `Duration (in seconds)` = col_double(),
#>   Finished = col_logical(),
#>   RecordedDate = col_datetime(format = ""),
#>   RecipientLastName = col_logical(),
#>   RecipientFirstName = col_logical(),
#>   RecipientEmail = col_logical(),
#>   ExternalReference = col_logical(),
#>   LocationLatitude = col_double(),
#>   LocationLongitude = col_double(),
#>   Q1007 = col_double(),
#>   SolutionRevision = col_double()
#> )
#> See spec(...) for full column specifications.

without_do
#> # A tibble: 121 x 36
#>    StartDate           EndDate             Status IPAddress Progress
#>    <dttm>              <dttm>              <chr>  <lgl>        <dbl>
#>  1 2020-03-29 20:47:24 2020-03-29 20:48:23 Surve… NA             100
#>  2 2020-03-29 20:50:02 2020-03-29 20:50:02 Surve… NA             100
#>  3 2020-03-29 20:50:02 2020-03-29 20:50:02 Surve… NA             100
#>  4 2020-03-29 20:50:02 2020-03-29 20:50:02 Surve… NA             100
#>  5 2020-03-29 20:50:03 2020-03-29 20:50:03 Surve… NA             100
#>  6 2020-03-29 20:50:03 2020-03-29 20:50:03 Surve… NA             100
#>  7 2020-03-29 20:50:03 2020-03-29 20:50:03 Surve… NA             100
#>  8 2020-03-29 20:50:03 2020-03-29 20:50:03 Surve… NA             100
#>  9 2020-03-29 20:50:03 2020-03-29 20:50:03 Surve… NA             100
#> 10 2020-03-29 20:50:03 2020-03-29 20:50:03 Surve… NA             100
#> # … with 111 more rows, and 31 more variables: `Duration (in seconds)` <dbl>,
#> #   Finished <lgl>, RecordedDate <dttm>, ResponseId <chr>,
#> #   RecipientLastName <lgl>, RecipientFirstName <lgl>, RecipientEmail <lgl>,
#> #   ExternalReference <lgl>, LocationLatitude <dbl>, LocationLongitude <dbl>,
#> #   DistributionChannel <chr>, UserLanguage <chr>, Q1002 <ord>, Q1006 <ord>,
#> #   Q1007 <dbl>, Q1_1 <chr>, Q1_2 <chr>, Q1_3 <chr>, Q1_4 <chr>, Q1_5 <chr>,
#> #   Q200 <ord>, Q300 <ord>, Q201 <ord>, Q301 <ord>, Q202 <ord>, Q302 <ord>,
#> #   Q203 <ord>, Q303 <ord>, Q204 <ord>, Q304 <ord>, SolutionRevision <dbl>

Created on 2020-03-29 by the reprex package (v0.3.0)

Notice those columns called FL_6_DO_FL_8 and similar? Those contain the information about the display order that that survey respondent had for that question. When include_display_order is FALSE, those columns are not there. Is it possible you are using a survey that doesn't have randomization?

juliasilge commented 4 years ago

Let me know if you still have problems with this @nthun! 🙌

nthun commented 4 years ago

Hi @juliasilge ,

Thanks for the reply! It is super awkward, but it turned out that my collaborator removed the randomization from the questionnaire without letting me know 🤦‍♂
Now that the randomization is back on, the function works as intended. Sorry for the false alarm, and thanks a lot for maintaining this package! ❤️