ropensci / qualtRics

Download ⬇️ Qualtrics survey data directly into R!
https://docs.ropensci.org/qualtRics
Other
215 stars 70 forks source link

New suggested function: `extract_questions()` #290

Closed rempsyc closed 1 year ago

rempsyc commented 1 year ago

When downloading data from Qualtrics manually, as csv, the questions/items wording (I'm calling those "labels") would appear on the first row. So I would create a separate data frame just of labels, which allowed me to quickly know which column name is associated with what item wording.

(Unfortunately, I'm not sure what is the recommended approach for a reprex when working with an API and qualtRics, but here's a non-reprex demo:)

> data_full <- read.csv("file.csv", header = TRUE)
> 
> # Get question labels
> labels.data <- data_full[1, ]
> 
> labels.data$BAQ_4
[1] "INSTRUCTIONS: Using the scale provided, indicate how uncharacteristic or characteristic each of the following statements is in describing you. - I tell my friends openly when I disagree with them."
> 
> # Remove irrelevant rows
> library(dplyr)
> data <- data_full %>% 
+   slice(-(1:2))

I was hoping that I could do something similar using the qualtRics package. However, survey_questions does not extract the individual questions, and does not do it in the right form either.

> library(qualtRics)
> survey_questions(survey1.id)
# A tibble: 30 × 4
   qid   qname               question                                                          force…¹
   <chr> <chr>               <chr>                                                             <lgl>  
 1 QID3  Consent             "<b>CONSENT FORM\n</b><br><br><b>Investigation of cognition and … FALSE  
 2 QID87 script.subjectid    "Please enter your Mechanical Turk Worker ID (only numbers and l… TRUE   
 3 QID9  procedures          "<span style=\"font-weight: 700;\">SURVEY PROCEDURE<br><br><span… FALSE  
 4 QID1  BSCS                "<span class=\"fontstyle0\">INSTRUCTIONS: Using the scale provid… FALSE  
 5 QID2  BAQ                 "<span class=\"fontstyle0\">INSTRUCTIONS: Using the scale provid… FALSE  
 6 QID73 priming.contr.instr "<strong>The next exercice is a scrambled sentence task.</strong… FALSE  
 7 QID74 priming.contr.1     "Choose from these words"                                         FALSE  
 8 QID75 priming.contr.2     "Choose from these words"                                         FALSE  
 9 QID76 priming.contr.3     "Choose from these words"                                         FALSE  
10 QID77 priming.contr.4     "Choose from these words"                                         FALSE  
# … with 20 more rows, and abbreviated variable name ¹​force_resp
# ℹ Use `print(n = ...)` to see more rows

For example, for the BAQ, there should be 13 items, but there is only a single row for it. The question "labels" are there though and can be seen when using the Viewer to examine the data. Normally, for example when importing SPSS data, one can easily access the item labels through attributes and make a data frame like above. But in the case of qualtRics, item labels are markedly absent from attributes:

> str(attributes(data_qualtRics))
List of 5
 $ row.names : int [1:839] 1 2 3 4 5 6 7 8 9 10 ...
 $ names     : chr [1:169] "StartDate" "EndDate" "Status" "IPAddress" ...
 $ problems  :<externalptr> 
 $ class     : chr [1:4] "spec_tbl_df" "tbl_df" "tbl" "data.frame"
 $ column_map: tibble [169 × 7] (S3: tbl_df/tbl/data.frame)
  ..$ qname      : chr [1:169] "StartDate" "EndDate" "Status" "IPAddress" ...
  ..$ description: chr [1:169] "Start Date" "End Date" "Response Type" "IP Address" ...
  ..$ main       : chr [1:169] "Start Date" "End Date" "Response Type" "IP Address" ...
  ..$ sub        : chr [1:169] "" "" "" "" ...
  ..$ ImportId   : chr [1:169] "startDate" "endDate" "status" "ipAddress" ...
  ..$ timeZone   : chr [1:169] "Z" "Z" NA NA ...
  ..$ choiceId   : logi [1:169] NA NA NA NA NA NA ...

Or rather, they are present, but hidden within column_map, and once again, not in the right format. I have tried to find a way to get it from extract_colmap but the process is far from being as straightforward as it used to be for users. Therefore, I propose the following function, extract_questions (or extract_labelsor extract_items, or whatever else fits better with the other function names), to accomplish this purpose in a user-friendly way:

> extract_questions <- function(respdata) {
+   extract_colmap(respdata) %>%
+     t %>%
+     as.data.frame %>%
+     setNames(names(respdata)) %>%
+     slice(2)
+ }
> 
> labels.data2 <- extract_questions(data_qualtRics)
> labels.data2$BAQ_4
[1] "INSTRUCTIONS: Using the scale provided, indicate how uncharacteristic or characteristic each of the following statements is in describing you. - I tell my friends openly when I disagree with them."

Would you like me to submit a PR for this?

juliasilge commented 1 year ago

Hmmmm, can you explain what you are looking for that isn't available via extract_colmap()?

library(qualtRics)
my_survey <- fetch_survey(surveyID = "SV_56icaa9YAafpAqx")
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   .default = col_character(),
#>   StartDate = col_datetime(format = ""),
#>   EndDate = col_datetime(format = ""),
#>   Progress = col_double(),
#>   `Duration (in seconds)` = col_double(),
#>   Finished = col_logical(),
#>   RecordedDate = col_datetime(format = ""),
#>   Q1.2_10_TEXT = col_logical(),
#>   Q3.13 = col_double(),
#>   SolutionRevision = col_double(),
#>   `Q3.8 - Parent Topics` = col_logical(),
#>   `Q3.8 - Sentiment Polarity` = col_double(),
#>   `Q3.8 - Sentiment Score` = col_double(),
#>   `Q3.8 - Topic Sentiment Label` = col_logical(),
#>   `Q3.8 - Topic Sentiment Score` = col_logical()
#> )
#> ℹ Use `spec()` for the full column specifications.
extract_colmap(my_survey)
#> # A tibble: 34 × 7
#>    qname                 description         main  sub   Impor…¹ timeZ…² choic…³
#>    <chr>                 <chr>               <chr> <chr> <chr>   <chr>   <chr>  
#>  1 StartDate             Start Date          Star… ""    startD… Americ… <NA>   
#>  2 EndDate               End Date            End … ""    endDate Americ… <NA>   
#>  3 Status                Response Type       Resp… ""    status  <NA>    <NA>   
#>  4 Progress              Progress            Prog… ""    progre… <NA>    <NA>   
#>  5 Duration (in seconds) Duration (in secon… Dura… ""    durati… <NA>    <NA>   
#>  6 Finished              Finished            Fini… ""    finish… <NA>    <NA>   
#>  7 RecordedDate          Recorded Date       Reco… ""    record… Americ… <NA>   
#>  8 ResponseId            Response ID         Resp… ""    _recor… <NA>    <NA>   
#>  9 DistributionChannel   Distribution Chann… Dist… ""    distri… <NA>    <NA>   
#> 10 UserLanguage          User Language       User… ""    userLa… <NA>    <NA>   
#> # … with 24 more rows, and abbreviated variable names ¹​ImportId, ²​timeZone,
#> #   ³​choiceId

Created on 2022-11-08 with reprex v2.0.2

That function is designed to be a straightforward way for users to have access to all the metadata mapping to each column.

rempsyc commented 1 year ago

I actually use extract_colmap to extract the required information in the above function I am proposing, but it is hard to access and the output is not in the right format.

Specifically, I want the column names from the original data frame to match the individual items. In the current case, as in your example, the tibble column names are, qname, description, main, sub, etc., and the actual column names are in the qname column, so it doesn't allow quick and easy access to the matching between the item names and the item wording.

What I am proposing is therefore a quality-of-life convenience function that would match the outcome when using the csvs manually downloaded from Qualtrics.

juliasilge commented 1 year ago

Is this the result you are wanting to use?

library(tidyverse)
library(qualtRics)
my_survey <- fetch_survey(surveyID = "SV_56icaa9YAafpAqx")
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   .default = col_character(),
#>   StartDate = col_datetime(format = ""),
#>   EndDate = col_datetime(format = ""),
#>   Progress = col_double(),
#>   `Duration (in seconds)` = col_double(),
#>   Finished = col_logical(),
#>   RecordedDate = col_datetime(format = ""),
#>   Q1.2_10_TEXT = col_logical(),
#>   Q3.13 = col_double(),
#>   SolutionRevision = col_double(),
#>   `Q3.8 - Parent Topics` = col_logical(),
#>   `Q3.8 - Sentiment Polarity` = col_double(),
#>   `Q3.8 - Sentiment Score` = col_double(),
#>   `Q3.8 - Topic Sentiment Label` = col_logical(),
#>   `Q3.8 - Topic Sentiment Score` = col_logical()
#> )
#> ℹ Use `spec()` for the full column specifications.
extract_colmap(my_survey) %>% 
  select(qname, description) %>% 
  pivot_wider(names_from = qname, values_from = description)
#> # A tibble: 1 × 34
#>   Start…¹ EndDate Status Progr…² Durat…³ Finis…⁴ Recor…⁵ Respo…⁶ Distr…⁷ UserL…⁸
#>   <chr>   <chr>   <chr>  <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>  
#> 1 Start … End Da… Respo… Progre… Durati… Finish… Record… Respon… Distri… User L…
#> # … with 24 more variables: Q1.2 <chr>, Q1.2_10_TEXT <chr>, Q2.1 <chr>,
#> #   Q3.13_NPS_GROUP <chr>, Q3.13 <chr>, Q3.2 <chr>, Q3.3 <chr>, Q3.4 <chr>,
#> #   Q3.7 <chr>, Q3.8 <chr>, Q37 <chr>, Q35_1 <chr>, Q35_2 <chr>, Q35_7 <chr>,
#> #   Q35_7_TEXT <chr>, Q4.1 <chr>, SolutionRevision <chr>,
#> #   `Q3.8 - Parent Topics` <chr>, `Q3.8 - Sentiment Polarity` <chr>,
#> #   `Q3.8 - Sentiment Score` <chr>, `Q3.8 - Sentiment` <chr>,
#> #   `Q3.8 - Topic Sentiment Label` <chr>, …

Created on 2022-11-08 with reprex v2.0.2

rempsyc commented 1 year ago

Yes! Thank you (didn't see you were already importing tidyr, so that works and is prettier code). So how would you call such a function? extract_description()?

juliasilge commented 1 year ago

This seems like a very specific, not-too-general use case (needing the column names in a wide format vs. a more flexible tidy format) so I don't think we'll add a new function to maintain. Instead, would you be interested in contributing this approach to the documentation, adding the 3-liner to get what you are interested in to the extract_colmap() docs?

library(tidyr)

extract_colmap(my_survey) %>% 
  select(qname, description) %>% 
  pivot_wider(names_from = qname, values_from = description)
rempsyc commented 1 year ago

This seems like a very specific, not-too-general use case

I would have to disagree here. My colleagues and I use this all the time, and I'm sure a lot more people would use it if they knew or thought about this feature. I therefore do not think it is a very specific, not-too-general use case. Let me attempt to show why.

For all question types I can think of, getting the description will always provide the most information. For example, we compare getting, respectively, the “description”, the “main”, or the “sub” for four question types: (1) matrix table (e.g., for a specific questionnaire), (2) default Qualtrics column (start date), (3) open-ended ID question, and (4) a pick, group, and rank question.

library(dplyr)
library(qualtRics)

extract_questions <- function(respdata, section = "description") {
  respdata %>% 
    extract_colmap %>% 
    select(qname, all_of(section)) %>% 
    tidyr::pivot_wider(names_from = qname, values_from = all_of(section))
}

my_survey <- fetch_survey(surveyID = "SV_3DV3mLKBinRY0DQ")
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   .default = col_double(),
#>   StartDate = col_datetime(format = ""),
#>   EndDate = col_datetime(format = ""),
#>   Status = col_character(),
#>   IPAddress = col_character(),
#>   Finished = col_logical(),
#>   RecordedDate = col_datetime(format = ""),
#>   ResponseId = col_character(),
#>   RecipientLastName = col_logical(),
#>   RecipientFirstName = col_logical(),
#>   RecipientEmail = col_logical(),
#>   ExternalReference = col_logical(),
#>   DistributionChannel = col_character(),
#>   UserLanguage = col_character(),
#>   Consent = col_character(),
#>   script.subjectid = col_character(),
#>   BSCS_1 = col_character(),
#>   BSCS_2 = col_character(),
#>   BSCS_3 = col_character(),
#>   BSCS_4 = col_character(),
#>   BSCS_5 = col_character()
#>   # ... with 121 more columns
#> )
#> ℹ Use `spec()` for the full column specifications.

# Get all relevant information
labels.data <- extract_questions(my_survey)
labels.data$BAQ_10
#> [1] "INSTRUCTIONS: Using the scale provided, indicate how uncharacteristic or characteristic each of the following 
# statements is in describing you. - Other people always seem to get the breaks."
labels.data$StartDate
#> [1] "Start Date"
labels.data$script.subjectid
#> [1] "Please enter your Mechanical Turk Worker ID (only numbers and letters are allowed, no special characters or 
# spaces):"
labels.data$priming.contr.10_0_1_RANK
#> [1] "Choose from these words - Ranks - Drag words here - honey - Rank"
# Everything is there :)

# Get only instructions
labels.data <- extract_questions(my_survey, section = "main")
labels.data$BAQ_10
#> [1] "INSTRUCTIONS: Using the scale provided, indicate how uncharacteristic or characteristic each of the following 
# statements is in describing you."
labels.data$StartDate
#> [1] "Start Date"
labels.data$script.subjectid
#> [1] "Please enter your Mechanical Turk Worker ID (only numbers and letters are allowed, no special characters or 
# spaces):"
labels.data$priming.contr.10_0_1_RANK
#> [1] "Choose from these words"
# Instructions are there but we are missing the specific item wordings for the BAQ and priming questions :(

# Get only item
labels.data <- extract_questions(my_survey, section = "sub")
labels.data$BAQ_10
#> [1] "Other people always seem to get the breaks."
labels.data$StartDate
#> [1] ""
labels.data$script.subjectid
#> [1] ""
labels.data$priming.contr.10_0_1_RANK
#> [1] "Ranks - Drag words here - honey - Rank"
# Specific item wordings for the BAQ and priming questions are there but we are missing the instructions as well as 
# the column names for start date and id questions :(

Created on 2022-11-09 with reprex v2.0.2

In psychology, where the use of questionnaires is omnipresent, we often have to be able to quickly access which column name is associated with which specific question (item wording), for PCA, EFA, CFA, SEM, or other purposes, so this is definitely a feature we need (i.e., wide-format). From experience, item labels is the one thing colleagues coming to R from SPSS miss.

Furthermore, the tidy format is perhaps flexible to you, but not everyone is familiar with data wrangling (certainly not in my network), so they would definitely not think of this format as flexible, or of how they could reach this result by themselves (plus it requires manually loading tidyr and perhaps dplyr, an extra step for them, rather than having the function take care of it for them). And even if the workaround does exist in the documentation, it is considerably less accessible to those users, who are also less likely to read or understand the documentation. I would rather prefer to make it as easy as possible for these users by making it a simple and dedicated function.

If you will really not consider adding this function to your package, I am happy to contribute this approach to the documentation as you suggest. However, I think there is a real community need for this (even if people might not know they need it), so I am also willing to consider adding this convenience function to my own package, rempsyc, if there are no other options. Alternatively, we could also first do a short survey for your user base (e.g., on Twitter or with colleagues) to know whether they would actually be interested in this feature :)

juliasilge commented 1 year ago

Have you tried using the support that qualtRics already has for quickly accessing item wording using sjlabelled? You can call these functions on a whole survey dataframe or on an individual column:

library(qualtRics)
my_survey <- fetch_survey(surveyID = "SV_56icaa9YAafpAqx")
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   .default = col_character(),
#>   StartDate = col_datetime(format = ""),
#>   EndDate = col_datetime(format = ""),
#>   Progress = col_double(),
#>   `Duration (in seconds)` = col_double(),
#>   Finished = col_logical(),
#>   RecordedDate = col_datetime(format = ""),
#>   Q1.2_10_TEXT = col_logical(),
#>   Q3.13 = col_double(),
#>   SolutionRevision = col_double(),
#>   `Q3.8 - Parent Topics` = col_logical(),
#>   `Q3.8 - Sentiment Polarity` = col_double(),
#>   `Q3.8 - Sentiment Score` = col_double(),
#>   `Q3.8 - Topic Sentiment Label` = col_logical(),
#>   `Q3.8 - Topic Sentiment Score` = col_logical()
#> )
#> ℹ Use `spec()` for the full column specifications.

sjlabelled::get_label(my_survey$Q3.7)
#>                                                                    Q3.7 
#> "Did having a little salami help you to be more effective at your job?"
sjlabelled::get_labels(my_survey$Q3.7)
#> [1] "Definitely yes"     "Probably yes"       "Might or might not"
#> [4] "Probably not"       "Definitely not"     NA

Created on 2022-11-11 with reprex v2.0.2

The qualtRics package already imports sjlabelled so this is already set up, ready to go.

jmobrien commented 1 year ago

@rempsyc, I'm a psychologist too, so I get where you're coming from about desires of people from the field, and why having item labels readily available is valuable. At the same time, I think we're coming at this from the wrong direction. More importantly, I think the crux of what you seem to be aiming for here--a convenient way to either inspect the text of a specific question or get a vector of item text--already exists.

The response data frame actually DOES have label attributes, just not in the place you looked. Rather than a single attribute applied to the data frame itself, each element (variable) has an attached attribute called "label". In doing it this way, we're sticking with what's emerged as the standard approach in R, thereby allowing qualtRics to work with packages focused on labeling functionality and/or transferring data between formats, like sjlabelled and haven.

As for the core inspection functionality, sjlabelled already offers good tools for that. Specifically, the function get_label() can do a lot:

require(qualtRics)
#> Loading required package: qualtRics
require(sjlabelled)
#> Loading required package: sjlabelled
require(tidyverse)
#> Loading required package: tidyverse

# Data from a survey used for testing Qualtrics API:
suppressMessages(
  testsurvey <-
    fetch_survey(surveyID = "SV_0pK7FIIGNNM0sNn", force_request = TRUE)
)
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%

# get_label() for obtaining all variable labels:
testsurvey |> 
  get_label() |> 
  head(23) |> tail(6) # truncating output since this is just illustrative
#>                                                  cond1_textbox 
#>        "This is a text box to fill in if you got Condition 1:" 
#>                                                    cond1_likeq 
#> "Do you like this question? If no, explain. - Selected Choice" 
#>                                             cond1_likeq_2_TEXT 
#>       "Do you like this question? If no, explain. - No - Text" 
#>                                                   if_likeq_yes 
#>                    "[if yes] - shows only if explicit \"yes\"" 
#>                                                    if_likeq_no 
#>                      "[if no] - shows only if explicit \"no\"" 
#>                                                if_likeq_notyes 
#>        "[if not yes] - should display for \"no\" or no answer"

# For a single variable, with NSE support (named!):
testsurvey |> 
  get_label(cond1_textbox) 
#>                                           cond1_textbox 
#> "This is a text box to fill in if you got Condition 1:"

# For labels from multiple variables (still named):
testsurvey |> 
  get_label(cond1_textbox, cond1_likeq)
#>                                                  cond1_textbox 
#>        "This is a text box to fill in if you got Condition 1:" 
#>                                                    cond1_likeq 
#> "Do you like this question? If no, explain. - Selected Choice"

# Use select() functionality to see particular subsets based on name:
testsurvey |> 
  #All items from a particular question matrix, dropping associated display order vbls:
  select(starts_with("SAMAT") & !contains("DO")) |> 
  get_label()
#>                                                     SAMAT_rcra_alice 
#>        "What about these people? What do you think of them? - Alice" 
#>                                                       SAMAT_rcra_bob 
#>          "What about these people? What do you think of them? - Bob" 
#>                                                     SAMAT_rcra_other 
#> "What about these people? What do you think of them? - Someone Else"

# get_label() also supports basic select functions
# (just not yet more complex queries like the previous):
testsurvey |> 
  get_label(starts_with("cond1"))
#>                                                  cond1_textbox 
#>        "This is a text box to fill in if you got Condition 1:" 
#>                                                    cond1_likeq 
#> "Do you like this question? If no, explain. - Selected Choice" 
#>                                             cond1_likeq_2_TEXT 
#>       "Do you like this question? If no, explain. - No - Text"

# Works on individual labelled variables too (still keeping names):
testsurvey |> 
  pull(cond1_textbox) |> 
  get_label()
#>                                           cond1_textbox 
#> "This is a text box to fill in if you got Condition 1:"

# If a 1-row dataframe is prefered, it's easy enough to generate:
testsurvey |> 
  get_label() |> 
  bind_rows()
#> # A tibble: 1 × 63
#>   StartDate  EndDate  Status        IPAddress Progress `Duration (in …` Finished
#>   <chr>      <chr>    <chr>         <chr>     <chr>    <chr>            <chr>   
#> 1 Start Date End Date Response Type IP Addre… Progress Duration (in se… Finished
#> # … with 56 more variables: RecordedDate <chr>, ResponseId <chr>,
#> #   RecipientLastName <chr>, RecipientFirstName <chr>, RecipientEmail <chr>,
#> #   ExternalReference <chr>, LocationLatitude <chr>, LocationLongitude <chr>,
#> #   DistributionChannel <chr>, UserLanguage <chr>, cond1_textbox <chr>,
#> #   cond1_likeq <chr>, cond1_likeq_2_TEXT <chr>, if_likeq_yes <chr>,
#> #   if_likeq_no <chr>, if_likeq_notyes <chr>,
#> #   `timingquestion_First Click` <chr>, `timingquestion_Last Click` <chr>, …

Created on 2022-11-11 by the reprex package (v2.0.1)

One thing you'll notice about the get_label() approach I think is nice is that label attributes attached to variables themselves follow those variables into subsets, new data frames, etc.--generally helping support the wider range of things that you can do with R vs. some software suites.

Plus, this means that many times won't even need an external object with your labels--just query your response data itself when you need something.

One thing you mentioned in your proposed function, but not covered above, is if you want just named versions of the "main" or "sub" components of the column map. Even there, the basic dplyr::pull() function can handle this (again using bind_rows() if you want that 1-row dataframe for making $ do what it does in your examples)

require(qualtRics)
#> Loading required package: qualtRics
require(sjlabelled)
#> Loading required package: sjlabelled
require(tidyverse)
#> Loading required package: tidyverse

# Load Qualtrics API testing survey:
suppressMessages(
  testsurvey <-
    fetch_survey(surveyID = "SV_0pK7FIIGNNM0sNn", force_request = TRUE)
)
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%

# All standard labels in 1-row DF (equivalent to the last example above)
onerow_dataframe <- 
  testsurvey |> 
  extract_colmap() |> 
  pull(description, qname) |> 
  bind_rows()

# Same thing with main labels only:
onerow_dataframe_main <- 
  testsurvey |> 
  extract_colmap() |> 
  pull(main, qname) |> 
  bind_rows() 

# Sub labels only, plus (one way to get) a subset
# (Prob only useful for something like CFA diagram labels SPECIFICALLY when 
# a Qualtrics matrix was used to collect the data.)
testsurvey |> 
  extract_colmap() |> 
  pull(sub, qname) |> 
  bind_rows() |> 
  select(starts_with("SAMAT") & !contains("DO")) |> 
  unlist()
#> SAMAT_rcra_alice   SAMAT_rcra_bob SAMAT_rcra_other 
#>          "Alice"            "Bob"   "Someone Else"

Created on 2022-11-11 by the reprex package (v2.0.1)

At this depth, though, I think I agree with @juliasilge that we're in pretty niche territory, where adding (and subsequently maintaining) a convenience function around this may not pay off. Considering the whole userbase, I'm guessing it will be rare to need something in the vicinity of this that can't be served at least as well by get_label() or similar. Even when there is a need, it's likely to be something specific enough to that user's analyses that any general-purpose convenience function will still need further tweaking code--and users may as well make something bespoke with the better-known tidyverse tools.

As far as the the column map itself, I think it's really there for purposes other than what you're imagining--for example, I used it when building full-scale programmatic documentation system for a large longitudinal study containing 100+ unique surveys. Again, for more quotidian needs during interactive analysis-building, I think sjlabelled covers things better than anything we might add.

(oh, and I'm generally ignoring survey_questions() here as the function is outdated--it relies on an older API endpoint, so everything doesn't always match up. That will be addressed eventually but it's a pretty big project.)

@rempsyc, it's clear you're coming in with a fairly detailed perspective, so I wanted to give you a similarly detailed response regarding why things are built as they are currently. If you still have further thoughts/comments, though, please do let us know. We'll keep the issue open for a bit.

(@juliasilge, now that I've written some of these examples up, I suppose I could see dropping something like it into a vignette about examining your data. Maybe referencing it in the fetch_survey() help.)

jmobrien commented 1 year ago

@juliasilge your comment came in while I was writing mine--sorry for the redundancy! But yes, we both agree that's a good approach.

rempsyc commented 1 year ago

I did not realize that sjlabelled was compatible with qualtRics, I should have thought of it (sorry!). Given that get_labels() does what I want, that it works well, and that it is already a required dependency of qualtRics, this is a satisfactory solution. Thank you! And thank you as well for these detailed responses. I guess I was expecting that I should be able to do this basic operation without having to rely on an external package, though in retrospect I think that's fine (however, would it be worth reexporting this function in qualtRics? Maybe not, just a passing thought I had).

I agree it would be awesome to promote this approach more explicitly in the documentation or the vignettes so you can refer people like me there :)

juliasilge commented 1 year ago

Thank you so much for the detailed info @jmobrien! 🙌

I do think that adding more in the docs about this would be helpful; I'll open a separate issue. 👍