Closed rempsyc closed 1 year ago
Hmmmm, can you explain what you are looking for that isn't available via extract_colmap()
?
library(qualtRics)
my_survey <- fetch_survey(surveyID = "SV_56icaa9YAafpAqx")
#> | | | 0% | |======================================================================| 100%
#>
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#> .default = col_character(),
#> StartDate = col_datetime(format = ""),
#> EndDate = col_datetime(format = ""),
#> Progress = col_double(),
#> `Duration (in seconds)` = col_double(),
#> Finished = col_logical(),
#> RecordedDate = col_datetime(format = ""),
#> Q1.2_10_TEXT = col_logical(),
#> Q3.13 = col_double(),
#> SolutionRevision = col_double(),
#> `Q3.8 - Parent Topics` = col_logical(),
#> `Q3.8 - Sentiment Polarity` = col_double(),
#> `Q3.8 - Sentiment Score` = col_double(),
#> `Q3.8 - Topic Sentiment Label` = col_logical(),
#> `Q3.8 - Topic Sentiment Score` = col_logical()
#> )
#> ℹ Use `spec()` for the full column specifications.
extract_colmap(my_survey)
#> # A tibble: 34 × 7
#> qname description main sub Impor…¹ timeZ…² choic…³
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 StartDate Start Date Star… "" startD… Americ… <NA>
#> 2 EndDate End Date End … "" endDate Americ… <NA>
#> 3 Status Response Type Resp… "" status <NA> <NA>
#> 4 Progress Progress Prog… "" progre… <NA> <NA>
#> 5 Duration (in seconds) Duration (in secon… Dura… "" durati… <NA> <NA>
#> 6 Finished Finished Fini… "" finish… <NA> <NA>
#> 7 RecordedDate Recorded Date Reco… "" record… Americ… <NA>
#> 8 ResponseId Response ID Resp… "" _recor… <NA> <NA>
#> 9 DistributionChannel Distribution Chann… Dist… "" distri… <NA> <NA>
#> 10 UserLanguage User Language User… "" userLa… <NA> <NA>
#> # … with 24 more rows, and abbreviated variable names ¹ImportId, ²timeZone,
#> # ³choiceId
Created on 2022-11-08 with reprex v2.0.2
That function is designed to be a straightforward way for users to have access to all the metadata mapping to each column.
I actually use extract_colmap
to extract the required information in the above function I am proposing, but it is hard to access and the output is not in the right format.
Specifically, I want the column names from the original data frame to match the individual items. In the current case, as in your example, the tibble column names are, qname
, description
, main
, sub
, etc., and the actual column names are in the qname
column, so it doesn't allow quick and easy access to the matching between the item names and the item wording.
What I am proposing is therefore a quality-of-life convenience function that would match the outcome when using the csvs manually downloaded from Qualtrics.
Is this the result you are wanting to use?
library(tidyverse)
library(qualtRics)
my_survey <- fetch_survey(surveyID = "SV_56icaa9YAafpAqx")
#> | | | 0% | |======================================================================| 100%
#>
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#> .default = col_character(),
#> StartDate = col_datetime(format = ""),
#> EndDate = col_datetime(format = ""),
#> Progress = col_double(),
#> `Duration (in seconds)` = col_double(),
#> Finished = col_logical(),
#> RecordedDate = col_datetime(format = ""),
#> Q1.2_10_TEXT = col_logical(),
#> Q3.13 = col_double(),
#> SolutionRevision = col_double(),
#> `Q3.8 - Parent Topics` = col_logical(),
#> `Q3.8 - Sentiment Polarity` = col_double(),
#> `Q3.8 - Sentiment Score` = col_double(),
#> `Q3.8 - Topic Sentiment Label` = col_logical(),
#> `Q3.8 - Topic Sentiment Score` = col_logical()
#> )
#> ℹ Use `spec()` for the full column specifications.
extract_colmap(my_survey) %>%
select(qname, description) %>%
pivot_wider(names_from = qname, values_from = description)
#> # A tibble: 1 × 34
#> Start…¹ EndDate Status Progr…² Durat…³ Finis…⁴ Recor…⁵ Respo…⁶ Distr…⁷ UserL…⁸
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Start … End Da… Respo… Progre… Durati… Finish… Record… Respon… Distri… User L…
#> # … with 24 more variables: Q1.2 <chr>, Q1.2_10_TEXT <chr>, Q2.1 <chr>,
#> # Q3.13_NPS_GROUP <chr>, Q3.13 <chr>, Q3.2 <chr>, Q3.3 <chr>, Q3.4 <chr>,
#> # Q3.7 <chr>, Q3.8 <chr>, Q37 <chr>, Q35_1 <chr>, Q35_2 <chr>, Q35_7 <chr>,
#> # Q35_7_TEXT <chr>, Q4.1 <chr>, SolutionRevision <chr>,
#> # `Q3.8 - Parent Topics` <chr>, `Q3.8 - Sentiment Polarity` <chr>,
#> # `Q3.8 - Sentiment Score` <chr>, `Q3.8 - Sentiment` <chr>,
#> # `Q3.8 - Topic Sentiment Label` <chr>, …
Created on 2022-11-08 with reprex v2.0.2
Yes! Thank you (didn't see you were already importing tidyr
, so that works and is prettier code). So how would you call such a function? extract_description()
?
This seems like a very specific, not-too-general use case (needing the column names in a wide format vs. a more flexible tidy format) so I don't think we'll add a new function to maintain. Instead, would you be interested in contributing this approach to the documentation, adding the 3-liner to get what you are interested in to the extract_colmap()
docs?
library(tidyr)
extract_colmap(my_survey) %>%
select(qname, description) %>%
pivot_wider(names_from = qname, values_from = description)
This seems like a very specific, not-too-general use case
I would have to disagree here. My colleagues and I use this all the time, and I'm sure a lot more people would use it if they knew or thought about this feature. I therefore do not think it is a very specific, not-too-general use case. Let me attempt to show why.
For all question types I can think of, getting the description will always provide the most information. For example, we compare getting, respectively, the “description”, the “main”, or the “sub” for four question types: (1) matrix table (e.g., for a specific questionnaire), (2) default Qualtrics column (start date), (3) open-ended ID question, and (4) a pick, group, and rank question.
library(dplyr)
library(qualtRics)
extract_questions <- function(respdata, section = "description") {
respdata %>%
extract_colmap %>%
select(qname, all_of(section)) %>%
tidyr::pivot_wider(names_from = qname, values_from = all_of(section))
}
my_survey <- fetch_survey(surveyID = "SV_3DV3mLKBinRY0DQ")
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#> .default = col_double(),
#> StartDate = col_datetime(format = ""),
#> EndDate = col_datetime(format = ""),
#> Status = col_character(),
#> IPAddress = col_character(),
#> Finished = col_logical(),
#> RecordedDate = col_datetime(format = ""),
#> ResponseId = col_character(),
#> RecipientLastName = col_logical(),
#> RecipientFirstName = col_logical(),
#> RecipientEmail = col_logical(),
#> ExternalReference = col_logical(),
#> DistributionChannel = col_character(),
#> UserLanguage = col_character(),
#> Consent = col_character(),
#> script.subjectid = col_character(),
#> BSCS_1 = col_character(),
#> BSCS_2 = col_character(),
#> BSCS_3 = col_character(),
#> BSCS_4 = col_character(),
#> BSCS_5 = col_character()
#> # ... with 121 more columns
#> )
#> ℹ Use `spec()` for the full column specifications.
# Get all relevant information
labels.data <- extract_questions(my_survey)
labels.data$BAQ_10
#> [1] "INSTRUCTIONS: Using the scale provided, indicate how uncharacteristic or characteristic each of the following
# statements is in describing you. - Other people always seem to get the breaks."
labels.data$StartDate
#> [1] "Start Date"
labels.data$script.subjectid
#> [1] "Please enter your Mechanical Turk Worker ID (only numbers and letters are allowed, no special characters or
# spaces):"
labels.data$priming.contr.10_0_1_RANK
#> [1] "Choose from these words - Ranks - Drag words here - honey - Rank"
# Everything is there :)
# Get only instructions
labels.data <- extract_questions(my_survey, section = "main")
labels.data$BAQ_10
#> [1] "INSTRUCTIONS: Using the scale provided, indicate how uncharacteristic or characteristic each of the following
# statements is in describing you."
labels.data$StartDate
#> [1] "Start Date"
labels.data$script.subjectid
#> [1] "Please enter your Mechanical Turk Worker ID (only numbers and letters are allowed, no special characters or
# spaces):"
labels.data$priming.contr.10_0_1_RANK
#> [1] "Choose from these words"
# Instructions are there but we are missing the specific item wordings for the BAQ and priming questions :(
# Get only item
labels.data <- extract_questions(my_survey, section = "sub")
labels.data$BAQ_10
#> [1] "Other people always seem to get the breaks."
labels.data$StartDate
#> [1] ""
labels.data$script.subjectid
#> [1] ""
labels.data$priming.contr.10_0_1_RANK
#> [1] "Ranks - Drag words here - honey - Rank"
# Specific item wordings for the BAQ and priming questions are there but we are missing the instructions as well as
# the column names for start date and id questions :(
Created on 2022-11-09 with reprex v2.0.2
In psychology, where the use of questionnaires is omnipresent, we often have to be able to quickly access which column name is associated with which specific question (item wording), for PCA, EFA, CFA, SEM, or other purposes, so this is definitely a feature we need (i.e., wide-format). From experience, item labels is the one thing colleagues coming to R from SPSS miss.
Furthermore, the tidy format is perhaps flexible to you, but not everyone is familiar with data wrangling (certainly not in my network), so they would definitely not think of this format as flexible, or of how they could reach this result by themselves (plus it requires manually loading tidyr
and perhaps dplyr
, an extra step for them, rather than having the function take care of it for them). And even if the workaround does exist in the documentation, it is considerably less accessible to those users, who are also less likely to read or understand the documentation. I would rather prefer to make it as easy as possible for these users by making it a simple and dedicated function.
If you will really not consider adding this function to your package, I am happy to contribute this approach to the documentation as you suggest. However, I think there is a real community need for this (even if people might not know they need it), so I am also willing to consider adding this convenience function to my own package, rempsyc
, if there are no other options. Alternatively, we could also first do a short survey for your user base (e.g., on Twitter or with colleagues) to know whether they would actually be interested in this feature :)
Have you tried using the support that qualtRics already has for quickly accessing item wording using sjlabelled? You can call these functions on a whole survey dataframe or on an individual column:
library(qualtRics)
my_survey <- fetch_survey(surveyID = "SV_56icaa9YAafpAqx")
#> | | | 0% | |======================================================================| 100%
#>
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#> .default = col_character(),
#> StartDate = col_datetime(format = ""),
#> EndDate = col_datetime(format = ""),
#> Progress = col_double(),
#> `Duration (in seconds)` = col_double(),
#> Finished = col_logical(),
#> RecordedDate = col_datetime(format = ""),
#> Q1.2_10_TEXT = col_logical(),
#> Q3.13 = col_double(),
#> SolutionRevision = col_double(),
#> `Q3.8 - Parent Topics` = col_logical(),
#> `Q3.8 - Sentiment Polarity` = col_double(),
#> `Q3.8 - Sentiment Score` = col_double(),
#> `Q3.8 - Topic Sentiment Label` = col_logical(),
#> `Q3.8 - Topic Sentiment Score` = col_logical()
#> )
#> ℹ Use `spec()` for the full column specifications.
sjlabelled::get_label(my_survey$Q3.7)
#> Q3.7
#> "Did having a little salami help you to be more effective at your job?"
sjlabelled::get_labels(my_survey$Q3.7)
#> [1] "Definitely yes" "Probably yes" "Might or might not"
#> [4] "Probably not" "Definitely not" NA
Created on 2022-11-11 with reprex v2.0.2
The qualtRics package already imports sjlabelled so this is already set up, ready to go.
@rempsyc, I'm a psychologist too, so I get where you're coming from about desires of people from the field, and why having item labels readily available is valuable. At the same time, I think we're coming at this from the wrong direction. More importantly, I think the crux of what you seem to be aiming for here--a convenient way to either inspect the text of a specific question or get a vector of item text--already exists.
The response data frame actually DOES have label attributes, just not in the place you looked. Rather than a single attribute applied to the data frame itself, each element (variable) has an attached attribute called "label". In doing it this way, we're sticking with what's emerged as the standard approach in R, thereby allowing qualtRics to work with packages focused on labeling functionality and/or transferring data between formats, like sjlabelled and haven.
As for the core inspection functionality, sjlabelled already offers good tools for that. Specifically, the function get_label()
can do a lot:
require(qualtRics)
#> Loading required package: qualtRics
require(sjlabelled)
#> Loading required package: sjlabelled
require(tidyverse)
#> Loading required package: tidyverse
# Data from a survey used for testing Qualtrics API:
suppressMessages(
testsurvey <-
fetch_survey(surveyID = "SV_0pK7FIIGNNM0sNn", force_request = TRUE)
)
#> | | | 0% | |======================================================================| 100%
# get_label() for obtaining all variable labels:
testsurvey |>
get_label() |>
head(23) |> tail(6) # truncating output since this is just illustrative
#> cond1_textbox
#> "This is a text box to fill in if you got Condition 1:"
#> cond1_likeq
#> "Do you like this question? If no, explain. - Selected Choice"
#> cond1_likeq_2_TEXT
#> "Do you like this question? If no, explain. - No - Text"
#> if_likeq_yes
#> "[if yes] - shows only if explicit \"yes\""
#> if_likeq_no
#> "[if no] - shows only if explicit \"no\""
#> if_likeq_notyes
#> "[if not yes] - should display for \"no\" or no answer"
# For a single variable, with NSE support (named!):
testsurvey |>
get_label(cond1_textbox)
#> cond1_textbox
#> "This is a text box to fill in if you got Condition 1:"
# For labels from multiple variables (still named):
testsurvey |>
get_label(cond1_textbox, cond1_likeq)
#> cond1_textbox
#> "This is a text box to fill in if you got Condition 1:"
#> cond1_likeq
#> "Do you like this question? If no, explain. - Selected Choice"
# Use select() functionality to see particular subsets based on name:
testsurvey |>
#All items from a particular question matrix, dropping associated display order vbls:
select(starts_with("SAMAT") & !contains("DO")) |>
get_label()
#> SAMAT_rcra_alice
#> "What about these people? What do you think of them? - Alice"
#> SAMAT_rcra_bob
#> "What about these people? What do you think of them? - Bob"
#> SAMAT_rcra_other
#> "What about these people? What do you think of them? - Someone Else"
# get_label() also supports basic select functions
# (just not yet more complex queries like the previous):
testsurvey |>
get_label(starts_with("cond1"))
#> cond1_textbox
#> "This is a text box to fill in if you got Condition 1:"
#> cond1_likeq
#> "Do you like this question? If no, explain. - Selected Choice"
#> cond1_likeq_2_TEXT
#> "Do you like this question? If no, explain. - No - Text"
# Works on individual labelled variables too (still keeping names):
testsurvey |>
pull(cond1_textbox) |>
get_label()
#> cond1_textbox
#> "This is a text box to fill in if you got Condition 1:"
# If a 1-row dataframe is prefered, it's easy enough to generate:
testsurvey |>
get_label() |>
bind_rows()
#> # A tibble: 1 × 63
#> StartDate EndDate Status IPAddress Progress `Duration (in …` Finished
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Start Date End Date Response Type IP Addre… Progress Duration (in se… Finished
#> # … with 56 more variables: RecordedDate <chr>, ResponseId <chr>,
#> # RecipientLastName <chr>, RecipientFirstName <chr>, RecipientEmail <chr>,
#> # ExternalReference <chr>, LocationLatitude <chr>, LocationLongitude <chr>,
#> # DistributionChannel <chr>, UserLanguage <chr>, cond1_textbox <chr>,
#> # cond1_likeq <chr>, cond1_likeq_2_TEXT <chr>, if_likeq_yes <chr>,
#> # if_likeq_no <chr>, if_likeq_notyes <chr>,
#> # `timingquestion_First Click` <chr>, `timingquestion_Last Click` <chr>, …
Created on 2022-11-11 by the reprex package (v2.0.1)
One thing you'll notice about the get_label()
approach I think is nice is that label attributes attached to variables themselves follow those variables into subsets, new data frames, etc.--generally helping support the wider range of things that you can do with R vs. some software suites.
Plus, this means that many times won't even need an external object with your labels--just query your response data itself when you need something.
One thing you mentioned in your proposed function, but not covered above, is if you want just named versions of the "main" or "sub" components of the column map. Even there, the basic dplyr::pull()
function can handle this (again using bind_rows()
if you want that 1-row dataframe for making $
do what it does in your examples)
require(qualtRics)
#> Loading required package: qualtRics
require(sjlabelled)
#> Loading required package: sjlabelled
require(tidyverse)
#> Loading required package: tidyverse
# Load Qualtrics API testing survey:
suppressMessages(
testsurvey <-
fetch_survey(surveyID = "SV_0pK7FIIGNNM0sNn", force_request = TRUE)
)
#> | | | 0% | |======================================================================| 100%
# All standard labels in 1-row DF (equivalent to the last example above)
onerow_dataframe <-
testsurvey |>
extract_colmap() |>
pull(description, qname) |>
bind_rows()
# Same thing with main labels only:
onerow_dataframe_main <-
testsurvey |>
extract_colmap() |>
pull(main, qname) |>
bind_rows()
# Sub labels only, plus (one way to get) a subset
# (Prob only useful for something like CFA diagram labels SPECIFICALLY when
# a Qualtrics matrix was used to collect the data.)
testsurvey |>
extract_colmap() |>
pull(sub, qname) |>
bind_rows() |>
select(starts_with("SAMAT") & !contains("DO")) |>
unlist()
#> SAMAT_rcra_alice SAMAT_rcra_bob SAMAT_rcra_other
#> "Alice" "Bob" "Someone Else"
Created on 2022-11-11 by the reprex package (v2.0.1)
At this depth, though, I think I agree with @juliasilge that we're in pretty niche territory, where adding (and subsequently maintaining) a convenience function around this may not pay off. Considering the whole userbase, I'm guessing it will be rare to need something in the vicinity of this that can't be served at least as well by get_label()
or similar. Even when there is a need, it's likely to be something specific enough to that user's analyses that any general-purpose convenience function will still need further tweaking code--and users may as well make something bespoke with the better-known tidyverse tools.
As far as the the column map itself, I think it's really there for purposes other than what you're imagining--for example, I used it when building full-scale programmatic documentation system for a large longitudinal study containing 100+ unique surveys. Again, for more quotidian needs during interactive analysis-building, I think sjlabelled covers things better than anything we might add.
(oh, and I'm generally ignoring survey_questions()
here as the function is outdated--it relies on an older API endpoint, so everything doesn't always match up. That will be addressed eventually but it's a pretty big project.)
@rempsyc, it's clear you're coming in with a fairly detailed perspective, so I wanted to give you a similarly detailed response regarding why things are built as they are currently. If you still have further thoughts/comments, though, please do let us know. We'll keep the issue open for a bit.
(@juliasilge, now that I've written some of these examples up, I suppose I could see dropping something like it into a vignette about examining your data. Maybe referencing it in the fetch_survey()
help.)
@juliasilge your comment came in while I was writing mine--sorry for the redundancy! But yes, we both agree that's a good approach.
I did not realize that sjlabelled
was compatible with qualtRics
, I should have thought of it (sorry!). Given that get_labels()
does what I want, that it works well, and that it is already a required dependency of qualtRics
, this is a satisfactory solution. Thank you! And thank you as well for these detailed responses. I guess I was expecting that I should be able to do this basic operation without having to rely on an external package, though in retrospect I think that's fine (however, would it be worth reexporting this function in qualtRics
? Maybe not, just a passing thought I had).
I agree it would be awesome to promote this approach more explicitly in the documentation or the vignettes so you can refer people like me there :)
Thank you so much for the detailed info @jmobrien! 🙌
I do think that adding more in the docs about this would be helpful; I'll open a separate issue. 👍
When downloading data from Qualtrics manually, as csv, the questions/items wording (I'm calling those "labels") would appear on the first row. So I would create a separate data frame just of labels, which allowed me to quickly know which column name is associated with what item wording.
(Unfortunately, I'm not sure what is the recommended approach for a reprex when working with an API and qualtRics, but here's a non-reprex demo:)
I was hoping that I could do something similar using the
qualtRics
package. However,survey_questions
does not extract the individual questions, and does not do it in the right form either.For example, for the BAQ, there should be 13 items, but there is only a single row for it. The question "labels" are there though and can be seen when using the Viewer to examine the data. Normally, for example when importing SPSS data, one can easily access the item labels through attributes and make a data frame like above. But in the case of qualtRics, item labels are markedly absent from attributes:
Or rather, they are present, but hidden within
column_map
, and once again, not in the right format. I have tried to find a way to get it fromextract_colmap
but the process is far from being as straightforward as it used to be for users. Therefore, I propose the following function,extract_questions
(orextract_labels
orextract_items
, or whatever else fits better with the other function names), to accomplish this purpose in a user-friendly way:Would you like me to submit a PR for this?