ropensci / qualtRics

Download ⬇️ Qualtrics survey data directly into R!
https://docs.ropensci.org/qualtRics
Other
215 stars 70 forks source link

Fetch_survey() error #194

Closed Rkol8 closed 3 years ago

Rkol8 commented 4 years ago

Warning: 32 parsing failures. row # A tibble: 5 x 5 col row col expected actual file
expected
actual 1 2 NA 11 columns 80 columns col 4 5 NA 11 columns 80 columns

Error in attr(x, "names") <- as.character(value) : 'names' attribute [80] must be the same length as the vector [11]

In rbind(names(probs), probs_f) : number of columns of result is not a multiple of vector length (arg 1)

For example below: Warning: 21 parsing failures. row # A tibble: 5 x 5 col row col expected actual file
expected
actual 1 2 NA 58 columns 109 columns file 2 3 NA 58 columns 109 columns

Rkol8 commented 4 years ago

Looks like it's happening as apart of read_survey (https://github.com/ropensci/qualtRics/blob/master/R/read_survey.R)

--import data including variable names (row 1) and variable labels (row 2) rawdata <- suppressMessages(readr::read_csv( file = file_name, col_names = FALSE, col_types = readr::cols(.default = readr::col_character()), skip = skipNr, na = c("") ))

When it reads the data, and skip = skipNr is = 3 (from skipNr <- ifelse(legacy, 2, 3)) it is stopping reading columns.

For my file, this only reads 58 columns: rawdata <- suppressWarnings(readr::read_csv( file = file.choose(), col_names = FALSE, col_types = readr::cols(.default = readr::col_character()), skip = 3, na = c("") ))

If I comment out skip I get 109 columns: rawdata <- suppressWarnings(readr::read_csv( file = file.choose(), col_names = FALSE, col_types = readr::cols(.default = readr::colcharacter()), **# skip = 3,_** na = c("") ))

The headers always reads in 109 columns: header <- suppressWarnings(suppressMessages(readr::read_csv( file = file.choose(), col_names = TRUE, col_types = readr::cols(.default = readr::col_character()), n_max = 1 )))

juliasilge commented 4 years ago

That sounds frustrating! To dig into this problem, we need you to use reprex so we can make sure the environment is clean, etc.

After you have that set up, try running this code:

library(qualtRics)
all_surveys()

Your output (what you paste here on GitHub) should look like this, but with your surveys (feel free to censor any PII that shows up):

library(qualtRics)
all_surveys()
#> # A tibble: 4 x 6
#>   id         name           ownerId     lastModified    creationDate    isActive
#>   <chr>      <chr>          <chr>       <chr>           <chr>           <lgl>   
#> 1 SV_3gbwq8… Beskar Armor   UR_8wfG5qs… 2020-02-20T01:… 2020-02-20T00:… TRUE    
#> 2 SV_56icaa… Cats Can Have… UR_8wfG5qs… 2020-06-10T23:… 2020-02-11T03:… TRUE    
#> 3 SV_5BJRo2… Sourdough Bre… UR_8wfG5qs… 2020-03-29T20:… 2020-03-29T20:… TRUE    
#> 4 SV_ezYI0F… Priorities fo… UR_8wfG5qs… 2020-09-22T19:… 2020-04-23T18:… TRUE

Created on 2020-10-26 by the reprex package (v0.3.0.9001)

If that works (i.e. you can see the surveys you should have access to), what happens if you try this with the survey ID, again using reprex?

library(qualtRics)
fetch_survey(surveyID = "SV_XXX") ## put your ID here
Rkol8 commented 4 years ago
library(qualtRics)
library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 3.5.3
#> Warning: package 'ggplot2' was built under R version 3.5.3
#> Warning: package 'tibble' was built under R version 3.5.3
#> Warning: package 'tidyr' was built under R version 3.5.3
#> Warning: package 'purrr' was built under R version 3.5.3
#> Warning: package 'dplyr' was built under R version 3.5.3

############################API OG##################################
qualtrics_api_credentials(api_key = "xxxxxx",
                          base_url = "xxxxxx",
                          overwrite = T)
#> To install your credentials for use in future sessions, run this function with `install = TRUE`.

#Post Implementation Surveys
emmi_pi <- fetch_survey(surveyID = "SV_xxxxx5MF", force_request = TRUE) %>% as_tibble()
#> 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=================================================================| 100%
#> Parsed with column specification:
#> cols(
#>   .default = col_character(),
#>   StartDate = col_datetime(format = ""),
#>   EndDate = col_datetime(format = ""),
#>   Progress = col_integer(),
#>   `Duration (in seconds)` = col_integer(),
#>   RecordedDate = col_datetime(format = ""),
#>   LocationLatitude = col_double(),
#>   LocationLongitude = col_double(),
#>   NPS = col_integer()
#> )
#> See spec(...) for full column specifications.

#MS End User Survey
ms_eu <- fetch_survey(surveyID = "SV_xxxxZ0V", force_request = TRUE) %>% as_tibble() #Error
#> 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=================================================================| 100%
#> Warning in rbind(names(probs), probs_f): number of columns of result is not
#> a multiple of vector length (arg 1)
#> Warning: 32 parsing failures.
#> row # A tibble: 5 x 5 col     row col   expected   actual   file                                      expected   <int> <chr> <chr>      <chr>    <chr>                                     actual 1     2 <NA>  11 columns 80 colu~ 'C:/Users/xxx~1.xxx/AppData/Local/Tem~ file 2     3 <NA>  11 columns 80 colu~ 'C:/Users/xxx~1.xxx/AppData/Local/Tem~ row 3     4 <NA>  11 columns 80 colu~ 'C:/Users/xxx~1.xxx/AppData/Local/Tem~ col 4     5 <NA>  11 columns 80 colu~ 'C:/Users/xxx~1.xxx/AppData/Local/Tem~ expected 5     6 <NA>  11 columns 80 colu~ 'C:/Users/xxx~1.xxx/AppData/Local/Tem~
#> ... ................. ... ........................................................................... ........ ........................................................................... ...... ........................................................................... .... ........................................................................... ... ........................................................................... ... ........................................................................... ........ ...........................................................................
#> See problems(...) for more details.
#> Error in attr(x, "names") <- as.character(value): 'names' attribute [80] must be the same length as the vector [11]

Created on 2020-10-27 by the reprex package (v0.3.0)

juliasilge commented 4 years ago

Seems like there is something unusual with the data downloaded from the API for your survey! Can you run the following code to get the path of where the raw, unprocessed CSV gets placed?

fetch_url <- qualtRics:::create_fetch_url(Sys.getenv("QUALTRICS_BASE_URL"), "SV_xxxxAqx")
raw_payload <- qualtRics:::create_raw_payload(
  label = TRUE,
  start_date = NULL,
  end_date = NULL,
  unanswer_recode = NULL,
  include_display_order = NULL,
  limit = NULL,
  time_zone = NULL,
  include_questions = NULL,
  breakout_sets = TRUE
)

res <- qualtRics:::qualtrics_api_request("POST", url = fetch_url, body = raw_payload)
requestID <- res$result$progressId
survey.fpath <- qualtRics:::download_qualtrics_export(fetch_url, requestID, verbose = TRUE)
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
survey.fpath
#> [1] "/var/folders/0w/prb4hnss2gn1p7y34qb2stw00000gp/T//RtmpQPVPqI/Cats Can Have a Little Salami.csv"

Created on 2020-10-30 by the reprex package (v0.3.0.9001)

And then you can read in the CSV directly with read_survey():

library(qualtRics)
path <- "/var/folders/0w/prb4hnss2gn1p7y34qb2stw00000gp/T//RtmpqvsBqM/Cats Can Have a Little Salami.csv"
read_survey(path)
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   .default = col_character(),
#>   StartDate = col_datetime(format = ""),
#>   EndDate = col_datetime(format = ""),
#>   Progress = col_double(),
#>   `Duration (in seconds)` = col_double(),
#>   Finished = col_logical(),
#>   RecordedDate = col_datetime(format = ""),
#>   Q1.2_10_TEXT = col_logical(),
#>   Q3.13 = col_double(),
#>   SolutionRevision = col_double(),
#>   `Q3.8 - Parent Topics` = col_logical(),
#>   `Q3.8 - Sentiment Polarity` = col_double(),
#>   `Q3.8 - Sentiment Score` = col_double(),
#>   `Q3.8 - Topic Sentiment Label` = col_logical(),
#>   `Q3.8 - Topic Sentiment Score` = col_logical()
#> )
#> ℹ Use `spec()` for the full column specifications.
#> # A tibble: 17 x 34
#>    StartDate           EndDate             Status Progress `Duration (in s…
#>    <dttm>              <dttm>              <chr>     <dbl>            <dbl>
#>  1 2020-02-11 03:44:39 2020-02-11 03:45:47 Surve…      100               67
#>  2 2020-02-11 04:08:16 2020-02-11 04:10:29 IP Ad…      100              133
#>  3 2020-02-11 04:08:21 2020-02-11 04:11:09 IP Ad…      100              168
#>  4 2020-02-11 04:25:59 2020-02-11 04:28:17 IP Ad…      100              137
#>  5 2020-02-11 05:18:29 2020-02-11 05:20:17 IP Ad…      100              107
#>  6 2020-02-11 05:57:47 2020-02-11 06:00:17 IP Ad…      100              149
#>  7 2020-02-11 06:39:05 2020-02-11 06:41:48 IP Ad…      100              163
#>  8 2020-02-11 06:46:53 2020-02-11 06:48:56 IP Ad…      100              122
#>  9 2020-02-11 10:30:40 2020-02-11 10:33:49 IP Ad…      100              188
#> 10 2020-02-11 13:15:40 2020-02-11 13:17:36 IP Ad…      100              115
#> 11 2020-02-12 00:59:38 2020-02-12 01:01:13 IP Ad…      100               94
#> 12 2020-02-13 16:41:14 2020-02-13 16:46:31 IP Ad…      100              316
#> 13 2020-02-13 23:24:36 2020-02-13 23:27:25 IP Ad…      100              168
#> 14 2020-02-11 04:05:27 2020-02-11 04:07:16 IP Ad…       15              108
#> 15 2020-02-13 16:24:45 2020-02-13 16:25:09 IP Ad…       15               24
#> 16 2020-02-24 21:08:13 2020-02-24 21:09:22 IP Ad…      100               68
#> 17 2020-03-04 22:35:48 2020-03-04 22:36:12 IP Ad…       15               23
#> # … with 29 more variables: Finished <lgl>, RecordedDate <dttm>,
#> #   ResponseId <chr>, DistributionChannel <chr>, UserLanguage <chr>,
#> #   Q1.2 <chr>, Q1.2_10_TEXT <lgl>, Q2.1 <chr>, Q3.13_NPS_GROUP <chr>,
#> #   Q3.13 <dbl>, Q3.2 <chr>, Q3.3 <chr>, Q3.4 <chr>, Q3.7 <chr>, Q3.8 <chr>,
#> #   Q37 <chr>, Q35_1 <chr>, Q35_2 <chr>, Q35_7 <chr>, Q35_7_TEXT <chr>,
#> #   Q4.1 <chr>, SolutionRevision <dbl>, `Q3.8 - Parent Topics` <lgl>, `Q3.8 -
#> #   Sentiment Polarity` <dbl>, `Q3.8 - Sentiment Score` <dbl>, `Q3.8 -
#> #   Sentiment` <chr>, `Q3.8 - Topic Sentiment Label` <lgl>, `Q3.8 - Topic
#> #   Sentiment Score` <lgl>, `Q3.8 - Topics` <chr>

Created on 2020-10-30 by the reprex package (v0.3.0.9001)

Can you also show me what happens when you use just plain read_csv()?

library(readr)
path <- "/var/folders/0w/prb4hnss2gn1p7y34qb2stw00000gp/T//RtmpqvsBqM/Cats Can Have a Little Salami.csv"
read_csv(path)
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   .default = col_character()
#> )
#> ℹ Use `spec()` for the full column specifications.
#> # A tibble: 19 x 34
#>    StartDate EndDate Status Progress `Duration (in s… Finished RecordedDate
#>    <chr>     <chr>   <chr>  <chr>    <chr>            <chr>    <chr>       
#>  1 "Start D… "End D… "Resp… "Progre… "Duration (in s… "Finish… "Recorded D…
#>  2 "{\"Impo… "{\"Im… "{\"I… "{\"Imp… "{\"ImportId\":… "{\"Imp… "{\"ImportI…
#>  3 "2020-02… "2020-… "Surv… "100"    "67"             "True"   "2020-02-11…
#>  4 "2020-02… "2020-… "IP A… "100"    "133"            "True"   "2020-02-11…
#>  5 "2020-02… "2020-… "IP A… "100"    "168"            "True"   "2020-02-11…
#>  6 "2020-02… "2020-… "IP A… "100"    "137"            "True"   "2020-02-11…
#>  7 "2020-02… "2020-… "IP A… "100"    "107"            "True"   "2020-02-11…
#>  8 "2020-02… "2020-… "IP A… "100"    "149"            "True"   "2020-02-11…
#>  9 "2020-02… "2020-… "IP A… "100"    "163"            "True"   "2020-02-11…
#> 10 "2020-02… "2020-… "IP A… "100"    "122"            "True"   "2020-02-11…
#> 11 "2020-02… "2020-… "IP A… "100"    "188"            "True"   "2020-02-11…
#> 12 "2020-02… "2020-… "IP A… "100"    "115"            "True"   "2020-02-11…
#> 13 "2020-02… "2020-… "IP A… "100"    "94"             "True"   "2020-02-12…
#> 14 "2020-02… "2020-… "IP A… "100"    "316"            "True"   "2020-02-13…
#> 15 "2020-02… "2020-… "IP A… "100"    "168"            "True"   "2020-02-13…
#> 16 "2020-02… "2020-… "IP A… "15"     "108"            "False"  "2020-02-18…
#> 17 "2020-02… "2020-… "IP A… "15"     "24"             "False"  "2020-02-20…
#> 18 "2020-02… "2020-… "IP A… "100"    "68"             "True"   "2020-02-24…
#> 19 "2020-03… "2020-… "IP A… "15"     "23"             "False"  "2020-03-11…
#> # … with 27 more variables: ResponseId <chr>, DistributionChannel <chr>,
#> #   UserLanguage <chr>, Q1.2 <chr>, Q1.2_10_TEXT <chr>, Q2.1 <chr>,
#> #   Q3.13_NPS_GROUP <chr>, Q3.13 <chr>, Q3.2 <chr>, Q3.3 <chr>, Q3.4 <chr>,
#> #   Q3.7 <chr>, Q3.8 <chr>, Q37 <chr>, Q35_1 <chr>, Q35_2 <chr>, Q35_7 <chr>,
#> #   Q35_7_TEXT <chr>, Q4.1 <chr>, SolutionRevision <chr>, `Q3.8 - Parent
#> #   Topics` <chr>, `Q3.8 - Sentiment Polarity` <chr>, `Q3.8 - Sentiment
#> #   Score` <chr>, `Q3.8 - Sentiment` <chr>, `Q3.8 - Topic Sentiment
#> #   Label` <chr>, `Q3.8 - Topic Sentiment Score` <chr>, `Q3.8 - Topics` <chr>

Created on 2020-10-30 by the reprex package (v0.3.0.9001)

I know this is tough to debug since I can't see your data or directly understand what is going on. 😔 If the data is not too sensitive, I am willing to talk a look at the CSV; you can email it to me at the address available in the DESCRIPTION file of this repo.

Rkol8 commented 4 years ago

Thanks for helping with this. Here is the first piece of that using read_survey:

library(qualtRics)
library(readr)

fetch_url <- qualtRics:::create_fetch_url(Sys.getenv("QUALTRICS_BASE_URL"), "SV_3gC4CNI6qdx5CTP")

raw_payload <- qualtRics:::create_raw_payload(
    label = TRUE,
    start_date = NULL,
    end_date = NULL,
    unanswer_recode = NULL,
    include_display_order = NULL,
    limit = NULL,
    time_zone = NULL,
    include_questions = NULL,
    breakout_sets = TRUE
)

res <- qualtRics:::qualtrics_api_request("POST", url = fetch_url, body = raw_payload)
#> Error in curl::curl_fetch_memory(url, handle = handle): Could not resolve host: https

requestID <- res$result$progressId
#> Error in eval(expr, envir, enclos): object 'res' not found

survey.fpath <- qualtRics:::download_qualtrics_export(fetch_url, requestID, verbose = TRUE)
#> Error in paste0(fetch_url, requestID): object 'requestID' not found

path <- "C:/Users/xxxx~1.xxxx/AppData/Local/Temp/RtmpCI8d13/2020 xxx Enterprise Key Contact.csv"

read_survey(path)
#> Warning in rbind(names(probs), probs_f): number of columns of result is not
#> a multiple of vector length (arg 1)
#> Warning: 37 parsing failures.
#> row # A tibble: 5 x 5 col     row col   expected   actual   file                                      expected   <int> <chr> <chr>      <chr>    <chr>                                     actual 1     2 <NA>  91 columns 113 col~ 'C:/Users/xxx~1.xxx/AppData/Local/Tem~ file 2     3 <NA>  91 columns 113 col~ 'C:/Users/xxx~1.xxx/AppData/Local/Tem~ row 3     4 <NA>  91 columns 113 col~ 'C:/Users/xxx~1.xxx/AppData/Local/Tem~ col 4     5 <NA>  91 columns 113 col~ 'C:/Users/xxx~1.xxx/AppData/Local/Tem~ expected 5     6 <NA>  91 columns 113 col~ 'C:/Users/xxx~1.xxx/AppData/Local/Tem~
#> ... ................. ... ........................................................................... ........ ........................................................................... ...... ........................................................................... .... ........................................................................... ... ........................................................................... ... ........................................................................... ........ ...........................................................................
#> See problems(...) for more details.
#> Error in attr(x, "names") <- as.character(value): 'names' attribute [113] must be the same length as the vector [91]

Created on 2020-11-02 by the reprex package (v0.3.0)

Rkol8 commented 4 years ago

And here it works with read_csv:

library(qualtRics)
library(readr)

fetch_url <- qualtRics:::create_fetch_url(Sys.getenv("QUALTRICS_BASE_URL"), "SV_3gC4CNI6qdx5CTP")

raw_payload <- qualtRics:::create_raw_payload(
    label = TRUE,
    start_date = NULL,
    end_date = NULL,
    unanswer_recode = NULL,
    include_display_order = NULL,
    limit = NULL,
    time_zone = NULL,
    include_questions = NULL,
    breakout_sets = TRUE
)

res <- qualtRics:::qualtrics_api_request("POST", url = fetch_url, body = raw_payload)
#> Error in curl::curl_fetch_memory(url, handle = handle): Could not resolve host: https

requestID <- res$result$progressId
#> Error in eval(expr, envir, enclos): object 'res' not found

survey.fpath <- qualtRics:::download_qualtrics_export(fetch_url, requestID, verbose = TRUE)
#> Error in paste0(fetch_url, requestID): object 'requestID' not found

path <- "C:/Users/xxx~1.xxx/AppData/Local/Temp/RtmpCI8d13/2020 xxx Enterprise Key Contact.csv"

read_csv(path)
#> Parsed with column specification:
#> cols(
#>   .default = col_character()
#> )
#> See spec(...) for full column specifications.
#> # A tibble: 38 x 113
#>    StartDate EndDate Status IPAddress Progress `Duration (in s~ Finished
#>    <chr>     <chr>   <chr>  <chr>     <chr>    <chr>            <chr>   
#>  1 "Start D~ "End D~ "Resp~ "IP Addr~ "Progre~ "Duration (in s~ "Finish~
#>  2 "{\"Impo~ "{\"Im~ "{\"I~ "{\"Impo~ "{\"Imp~ "{\"ImportId\":~ "{\"Imp~
#>  3 "2020-10~ "2020-~ "IP A~ "xxx.~ "100"    "308"            "True"  
#>  4 "2020-10~ "2020-~ "IP A~ "xxx.~ "100"    "706"            "True"  
#>  5 "2020-10~ "2020-~ "IP A~ "xxx.~ "100"    "407"            "True"  
#>  6 "2020-10~ "2020-~ "IP A~ "xxx~ "100"    "353"            "True"  
#>  7 "2020-10~ "2020-~ "IP A~ "xxx~ "100"    "245"            "True"  
#>  8 "2020-10~ "2020-~ "IP A~ "xxx~ "100"    "357"            "True"  
#>  9 "2020-10~ "2020-~ "IP A~ "xxx~ "100"    "635"            "True"  
#> 10 "2020-10~ "2020-~ "IP A~ "xxx.~ "100"    "542"            "True"  
#> # ... with 28 more rows, and 106 more variables: RecordedDate <chr>,
#> #   ResponseId <chr>, RecipientLastName <chr>, RecipientFirstName <chr>,
#> #   RecipientEmail <chr>, ExternalReference <chr>, LocationLatitude <chr>,
#> #   LocationLongitude <chr>, DistributionChannel <chr>,
#> #   UserLanguage <chr>, NPS_NPS_GROUP <chr>, NPS <chr>, Q21 <chr>,
#> #   Q17 <chr>, Q17_12_TEXT <chr>, Q44_1 <chr>, Q44_2 <chr>, Q44_3 <chr>,
#> #   Q44_4 <chr>, Q44_5 <chr>, Q44_6 <chr>, Q44_7 <chr>, Q44_8 <chr>,
#> #   Q44_8_TEXT <chr>, Q45_1 <chr>, Q45_2 <chr>, Q45_3 <chr>, Q45_4 <chr>,
#> #   Q45_5 <chr>, Q45_6 <chr>, Q45_7 <chr>, Q45_7_TEXT <chr>,
#> #   QID138441044 <chr>, Q22 <chr>, Q46 <chr>, Q36_1 <chr>, Q36_2 <chr>,
#> #   Q36_3 <chr>, Q36_4 <chr>, Q36_5 <chr>, Q36_6 <chr>, Q36_7 <chr>,
#> #   Q36_8 <chr>, Q26_1 <chr>, Q26_2 <chr>, Q26_3 <chr>, Q26_4 <chr>,
#> #   Q26_9 <chr>, Q26_10 <chr>, Q26_8 <chr>, Q26_8_TEXT <chr>, Q48_1 <chr>,
#> #   Q48_2 <chr>, Q48_3 <chr>, Q48_4 <chr>, Q48_5 <chr>, Q48_6 <chr>,
#> #   Q51_1 <chr>, Q51_2 <chr>, Q51_3 <chr>, Q51_4 <chr>, Q51_5 <chr>,
#> #   Q53_1 <chr>, Q53_2 <chr>, Q53_3 <chr>, Q53_4 <chr>, Q53_5 <chr>,
#> #   Q55_1 <chr>, Q55_2 <chr>, Q55_3 <chr>, Q55_4 <chr>, Q57 <chr>,
#> #   Q59 <chr>, Q62 <chr>, Q62_4_TEXT <chr>, Q61 <chr>, Q23 <chr>,
#> #   QID138441036 <chr>, Q25 <chr>, Q13 <chr>, Q29 <chr>, Q32 <chr>,
#> #   Q56 <chr>, Q30_1 <chr>, Q30_2 <chr>, Q30_3 <chr>, Q30_4 <chr>,
#> #   Q30_5 <chr>, Q14_1 <chr>, Q14_2 <chr>, Q14_3 <chr>, Q15 <chr>,
#> #   Q34 <chr>, Q35 <chr>, `timing_First Click` <chr>, `timing_Last
#> #   Click` <chr>, `timing_Page Submit` <chr>, `timing_Click Count` <chr>,
#> #   metaInfo_Browser <chr>, metaInfo_Version <chr>, ...

Created on 2020-11-02 by the reprex package (v0.3.0)

juliasilge commented 4 years ago

If I have counted right, it looks like the 58th column is Q26_8_TEXT, with the 59th column Q48_1. Can you show me a bit (or describe) what is different about the columns to the right/left of Q26_8_TEXT that is causing them not to be read in when we take off the top rows?

This only reads in 58, you are saying:

readr::read_csv(
    file = <YOUR PATH>, col_names = FALSE, 
    col_types = readr::cols(.default = readr::col_character()),
    skip = 3, na = c("")
)

As a workaround for now, have you tried using using the include_questions argument of fetch_survey(), to only include questions you need and avoid any that are causing this problem?

juliasilge commented 3 years ago

Feel free to open a new issue if you run into further questions or problems! 🙌