Closed Rkol8 closed 3 years ago
Looks like it's happening as apart of read_survey (https://github.com/ropensci/qualtRics/blob/master/R/read_survey.R)
--import data including variable names (row 1) and variable labels (row 2) rawdata <- suppressMessages(readr::read_csv( file = file_name, col_names = FALSE, col_types = readr::cols(.default = readr::col_character()), skip = skipNr, na = c("") ))
When it reads the data, and skip = skipNr is = 3 (from skipNr <- ifelse(legacy, 2, 3)) it is stopping reading columns.
For my file, this only reads 58 columns: rawdata <- suppressWarnings(readr::read_csv( file = file.choose(), col_names = FALSE, col_types = readr::cols(.default = readr::col_character()), skip = 3, na = c("") ))
If I comment out skip I get 109 columns: rawdata <- suppressWarnings(readr::read_csv( file = file.choose(), col_names = FALSE, col_types = readr::cols(.default = readr::colcharacter()), **# skip = 3,_** na = c("") ))
The headers always reads in 109 columns: header <- suppressWarnings(suppressMessages(readr::read_csv( file = file.choose(), col_names = TRUE, col_types = readr::cols(.default = readr::col_character()), n_max = 1 )))
That sounds frustrating! To dig into this problem, we need you to use reprex so we can make sure the environment is clean, etc.
qualtrics_api_credentials()
with install = TRUE
to install your credentials to your .Renviron
file?After you have that set up, try running this code:
library(qualtRics)
all_surveys()
Your output (what you paste here on GitHub) should look like this, but with your surveys (feel free to censor any PII that shows up):
library(qualtRics)
all_surveys()
#> # A tibble: 4 x 6
#> id name ownerId lastModified creationDate isActive
#> <chr> <chr> <chr> <chr> <chr> <lgl>
#> 1 SV_3gbwq8… Beskar Armor UR_8wfG5qs… 2020-02-20T01:… 2020-02-20T00:… TRUE
#> 2 SV_56icaa… Cats Can Have… UR_8wfG5qs… 2020-06-10T23:… 2020-02-11T03:… TRUE
#> 3 SV_5BJRo2… Sourdough Bre… UR_8wfG5qs… 2020-03-29T20:… 2020-03-29T20:… TRUE
#> 4 SV_ezYI0F… Priorities fo… UR_8wfG5qs… 2020-09-22T19:… 2020-04-23T18:… TRUE
Created on 2020-10-26 by the reprex package (v0.3.0.9001)
If that works (i.e. you can see the surveys you should have access to), what happens if you try this with the survey ID, again using reprex?
library(qualtRics)
fetch_survey(surveyID = "SV_XXX") ## put your ID here
library(qualtRics)
library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 3.5.3
#> Warning: package 'ggplot2' was built under R version 3.5.3
#> Warning: package 'tibble' was built under R version 3.5.3
#> Warning: package 'tidyr' was built under R version 3.5.3
#> Warning: package 'purrr' was built under R version 3.5.3
#> Warning: package 'dplyr' was built under R version 3.5.3
############################API OG##################################
qualtrics_api_credentials(api_key = "xxxxxx",
base_url = "xxxxxx",
overwrite = T)
#> To install your credentials for use in future sessions, run this function with `install = TRUE`.
#Post Implementation Surveys
emmi_pi <- fetch_survey(surveyID = "SV_xxxxx5MF", force_request = TRUE) %>% as_tibble()
#>
|
| | 0%
|
|=================================================================| 100%
#> Parsed with column specification:
#> cols(
#> .default = col_character(),
#> StartDate = col_datetime(format = ""),
#> EndDate = col_datetime(format = ""),
#> Progress = col_integer(),
#> `Duration (in seconds)` = col_integer(),
#> RecordedDate = col_datetime(format = ""),
#> LocationLatitude = col_double(),
#> LocationLongitude = col_double(),
#> NPS = col_integer()
#> )
#> See spec(...) for full column specifications.
#MS End User Survey
ms_eu <- fetch_survey(surveyID = "SV_xxxxZ0V", force_request = TRUE) %>% as_tibble() #Error
#>
|
| | 0%
|
|=================================================================| 100%
#> Warning in rbind(names(probs), probs_f): number of columns of result is not
#> a multiple of vector length (arg 1)
#> Warning: 32 parsing failures.
#> row # A tibble: 5 x 5 col row col expected actual file expected <int> <chr> <chr> <chr> <chr> actual 1 2 <NA> 11 columns 80 colu~ 'C:/Users/xxx~1.xxx/AppData/Local/Tem~ file 2 3 <NA> 11 columns 80 colu~ 'C:/Users/xxx~1.xxx/AppData/Local/Tem~ row 3 4 <NA> 11 columns 80 colu~ 'C:/Users/xxx~1.xxx/AppData/Local/Tem~ col 4 5 <NA> 11 columns 80 colu~ 'C:/Users/xxx~1.xxx/AppData/Local/Tem~ expected 5 6 <NA> 11 columns 80 colu~ 'C:/Users/xxx~1.xxx/AppData/Local/Tem~
#> ... ................. ... ........................................................................... ........ ........................................................................... ...... ........................................................................... .... ........................................................................... ... ........................................................................... ... ........................................................................... ........ ...........................................................................
#> See problems(...) for more details.
#> Error in attr(x, "names") <- as.character(value): 'names' attribute [80] must be the same length as the vector [11]
Created on 2020-10-27 by the reprex package (v0.3.0)
Seems like there is something unusual with the data downloaded from the API for your survey! Can you run the following code to get the path of where the raw, unprocessed CSV gets placed?
fetch_url <- qualtRics:::create_fetch_url(Sys.getenv("QUALTRICS_BASE_URL"), "SV_xxxxAqx")
raw_payload <- qualtRics:::create_raw_payload(
label = TRUE,
start_date = NULL,
end_date = NULL,
unanswer_recode = NULL,
include_display_order = NULL,
limit = NULL,
time_zone = NULL,
include_questions = NULL,
breakout_sets = TRUE
)
res <- qualtRics:::qualtrics_api_request("POST", url = fetch_url, body = raw_payload)
requestID <- res$result$progressId
survey.fpath <- qualtRics:::download_qualtrics_export(fetch_url, requestID, verbose = TRUE)
#> | | | 0% | |======================================================================| 100%
survey.fpath
#> [1] "/var/folders/0w/prb4hnss2gn1p7y34qb2stw00000gp/T//RtmpQPVPqI/Cats Can Have a Little Salami.csv"
Created on 2020-10-30 by the reprex package (v0.3.0.9001)
And then you can read in the CSV directly with read_survey()
:
library(qualtRics)
path <- "/var/folders/0w/prb4hnss2gn1p7y34qb2stw00000gp/T//RtmpqvsBqM/Cats Can Have a Little Salami.csv"
read_survey(path)
#>
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#> .default = col_character(),
#> StartDate = col_datetime(format = ""),
#> EndDate = col_datetime(format = ""),
#> Progress = col_double(),
#> `Duration (in seconds)` = col_double(),
#> Finished = col_logical(),
#> RecordedDate = col_datetime(format = ""),
#> Q1.2_10_TEXT = col_logical(),
#> Q3.13 = col_double(),
#> SolutionRevision = col_double(),
#> `Q3.8 - Parent Topics` = col_logical(),
#> `Q3.8 - Sentiment Polarity` = col_double(),
#> `Q3.8 - Sentiment Score` = col_double(),
#> `Q3.8 - Topic Sentiment Label` = col_logical(),
#> `Q3.8 - Topic Sentiment Score` = col_logical()
#> )
#> ℹ Use `spec()` for the full column specifications.
#> # A tibble: 17 x 34
#> StartDate EndDate Status Progress `Duration (in s…
#> <dttm> <dttm> <chr> <dbl> <dbl>
#> 1 2020-02-11 03:44:39 2020-02-11 03:45:47 Surve… 100 67
#> 2 2020-02-11 04:08:16 2020-02-11 04:10:29 IP Ad… 100 133
#> 3 2020-02-11 04:08:21 2020-02-11 04:11:09 IP Ad… 100 168
#> 4 2020-02-11 04:25:59 2020-02-11 04:28:17 IP Ad… 100 137
#> 5 2020-02-11 05:18:29 2020-02-11 05:20:17 IP Ad… 100 107
#> 6 2020-02-11 05:57:47 2020-02-11 06:00:17 IP Ad… 100 149
#> 7 2020-02-11 06:39:05 2020-02-11 06:41:48 IP Ad… 100 163
#> 8 2020-02-11 06:46:53 2020-02-11 06:48:56 IP Ad… 100 122
#> 9 2020-02-11 10:30:40 2020-02-11 10:33:49 IP Ad… 100 188
#> 10 2020-02-11 13:15:40 2020-02-11 13:17:36 IP Ad… 100 115
#> 11 2020-02-12 00:59:38 2020-02-12 01:01:13 IP Ad… 100 94
#> 12 2020-02-13 16:41:14 2020-02-13 16:46:31 IP Ad… 100 316
#> 13 2020-02-13 23:24:36 2020-02-13 23:27:25 IP Ad… 100 168
#> 14 2020-02-11 04:05:27 2020-02-11 04:07:16 IP Ad… 15 108
#> 15 2020-02-13 16:24:45 2020-02-13 16:25:09 IP Ad… 15 24
#> 16 2020-02-24 21:08:13 2020-02-24 21:09:22 IP Ad… 100 68
#> 17 2020-03-04 22:35:48 2020-03-04 22:36:12 IP Ad… 15 23
#> # … with 29 more variables: Finished <lgl>, RecordedDate <dttm>,
#> # ResponseId <chr>, DistributionChannel <chr>, UserLanguage <chr>,
#> # Q1.2 <chr>, Q1.2_10_TEXT <lgl>, Q2.1 <chr>, Q3.13_NPS_GROUP <chr>,
#> # Q3.13 <dbl>, Q3.2 <chr>, Q3.3 <chr>, Q3.4 <chr>, Q3.7 <chr>, Q3.8 <chr>,
#> # Q37 <chr>, Q35_1 <chr>, Q35_2 <chr>, Q35_7 <chr>, Q35_7_TEXT <chr>,
#> # Q4.1 <chr>, SolutionRevision <dbl>, `Q3.8 - Parent Topics` <lgl>, `Q3.8 -
#> # Sentiment Polarity` <dbl>, `Q3.8 - Sentiment Score` <dbl>, `Q3.8 -
#> # Sentiment` <chr>, `Q3.8 - Topic Sentiment Label` <lgl>, `Q3.8 - Topic
#> # Sentiment Score` <lgl>, `Q3.8 - Topics` <chr>
Created on 2020-10-30 by the reprex package (v0.3.0.9001)
Can you also show me what happens when you use just plain read_csv()
?
library(readr)
path <- "/var/folders/0w/prb4hnss2gn1p7y34qb2stw00000gp/T//RtmpqvsBqM/Cats Can Have a Little Salami.csv"
read_csv(path)
#>
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#> .default = col_character()
#> )
#> ℹ Use `spec()` for the full column specifications.
#> # A tibble: 19 x 34
#> StartDate EndDate Status Progress `Duration (in s… Finished RecordedDate
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 "Start D… "End D… "Resp… "Progre… "Duration (in s… "Finish… "Recorded D…
#> 2 "{\"Impo… "{\"Im… "{\"I… "{\"Imp… "{\"ImportId\":… "{\"Imp… "{\"ImportI…
#> 3 "2020-02… "2020-… "Surv… "100" "67" "True" "2020-02-11…
#> 4 "2020-02… "2020-… "IP A… "100" "133" "True" "2020-02-11…
#> 5 "2020-02… "2020-… "IP A… "100" "168" "True" "2020-02-11…
#> 6 "2020-02… "2020-… "IP A… "100" "137" "True" "2020-02-11…
#> 7 "2020-02… "2020-… "IP A… "100" "107" "True" "2020-02-11…
#> 8 "2020-02… "2020-… "IP A… "100" "149" "True" "2020-02-11…
#> 9 "2020-02… "2020-… "IP A… "100" "163" "True" "2020-02-11…
#> 10 "2020-02… "2020-… "IP A… "100" "122" "True" "2020-02-11…
#> 11 "2020-02… "2020-… "IP A… "100" "188" "True" "2020-02-11…
#> 12 "2020-02… "2020-… "IP A… "100" "115" "True" "2020-02-11…
#> 13 "2020-02… "2020-… "IP A… "100" "94" "True" "2020-02-12…
#> 14 "2020-02… "2020-… "IP A… "100" "316" "True" "2020-02-13…
#> 15 "2020-02… "2020-… "IP A… "100" "168" "True" "2020-02-13…
#> 16 "2020-02… "2020-… "IP A… "15" "108" "False" "2020-02-18…
#> 17 "2020-02… "2020-… "IP A… "15" "24" "False" "2020-02-20…
#> 18 "2020-02… "2020-… "IP A… "100" "68" "True" "2020-02-24…
#> 19 "2020-03… "2020-… "IP A… "15" "23" "False" "2020-03-11…
#> # … with 27 more variables: ResponseId <chr>, DistributionChannel <chr>,
#> # UserLanguage <chr>, Q1.2 <chr>, Q1.2_10_TEXT <chr>, Q2.1 <chr>,
#> # Q3.13_NPS_GROUP <chr>, Q3.13 <chr>, Q3.2 <chr>, Q3.3 <chr>, Q3.4 <chr>,
#> # Q3.7 <chr>, Q3.8 <chr>, Q37 <chr>, Q35_1 <chr>, Q35_2 <chr>, Q35_7 <chr>,
#> # Q35_7_TEXT <chr>, Q4.1 <chr>, SolutionRevision <chr>, `Q3.8 - Parent
#> # Topics` <chr>, `Q3.8 - Sentiment Polarity` <chr>, `Q3.8 - Sentiment
#> # Score` <chr>, `Q3.8 - Sentiment` <chr>, `Q3.8 - Topic Sentiment
#> # Label` <chr>, `Q3.8 - Topic Sentiment Score` <chr>, `Q3.8 - Topics` <chr>
Created on 2020-10-30 by the reprex package (v0.3.0.9001)
I know this is tough to debug since I can't see your data or directly understand what is going on. 😔 If the data is not too sensitive, I am willing to talk a look at the CSV; you can email it to me at the address available in the DESCRIPTION
file of this repo.
Thanks for helping with this. Here is the first piece of that using read_survey:
library(qualtRics)
library(readr)
fetch_url <- qualtRics:::create_fetch_url(Sys.getenv("QUALTRICS_BASE_URL"), "SV_3gC4CNI6qdx5CTP")
raw_payload <- qualtRics:::create_raw_payload(
label = TRUE,
start_date = NULL,
end_date = NULL,
unanswer_recode = NULL,
include_display_order = NULL,
limit = NULL,
time_zone = NULL,
include_questions = NULL,
breakout_sets = TRUE
)
res <- qualtRics:::qualtrics_api_request("POST", url = fetch_url, body = raw_payload)
#> Error in curl::curl_fetch_memory(url, handle = handle): Could not resolve host: https
requestID <- res$result$progressId
#> Error in eval(expr, envir, enclos): object 'res' not found
survey.fpath <- qualtRics:::download_qualtrics_export(fetch_url, requestID, verbose = TRUE)
#> Error in paste0(fetch_url, requestID): object 'requestID' not found
path <- "C:/Users/xxxx~1.xxxx/AppData/Local/Temp/RtmpCI8d13/2020 xxx Enterprise Key Contact.csv"
read_survey(path)
#> Warning in rbind(names(probs), probs_f): number of columns of result is not
#> a multiple of vector length (arg 1)
#> Warning: 37 parsing failures.
#> row # A tibble: 5 x 5 col row col expected actual file expected <int> <chr> <chr> <chr> <chr> actual 1 2 <NA> 91 columns 113 col~ 'C:/Users/xxx~1.xxx/AppData/Local/Tem~ file 2 3 <NA> 91 columns 113 col~ 'C:/Users/xxx~1.xxx/AppData/Local/Tem~ row 3 4 <NA> 91 columns 113 col~ 'C:/Users/xxx~1.xxx/AppData/Local/Tem~ col 4 5 <NA> 91 columns 113 col~ 'C:/Users/xxx~1.xxx/AppData/Local/Tem~ expected 5 6 <NA> 91 columns 113 col~ 'C:/Users/xxx~1.xxx/AppData/Local/Tem~
#> ... ................. ... ........................................................................... ........ ........................................................................... ...... ........................................................................... .... ........................................................................... ... ........................................................................... ... ........................................................................... ........ ...........................................................................
#> See problems(...) for more details.
#> Error in attr(x, "names") <- as.character(value): 'names' attribute [113] must be the same length as the vector [91]
Created on 2020-11-02 by the reprex package (v0.3.0)
And here it works with read_csv:
library(qualtRics)
library(readr)
fetch_url <- qualtRics:::create_fetch_url(Sys.getenv("QUALTRICS_BASE_URL"), "SV_3gC4CNI6qdx5CTP")
raw_payload <- qualtRics:::create_raw_payload(
label = TRUE,
start_date = NULL,
end_date = NULL,
unanswer_recode = NULL,
include_display_order = NULL,
limit = NULL,
time_zone = NULL,
include_questions = NULL,
breakout_sets = TRUE
)
res <- qualtRics:::qualtrics_api_request("POST", url = fetch_url, body = raw_payload)
#> Error in curl::curl_fetch_memory(url, handle = handle): Could not resolve host: https
requestID <- res$result$progressId
#> Error in eval(expr, envir, enclos): object 'res' not found
survey.fpath <- qualtRics:::download_qualtrics_export(fetch_url, requestID, verbose = TRUE)
#> Error in paste0(fetch_url, requestID): object 'requestID' not found
path <- "C:/Users/xxx~1.xxx/AppData/Local/Temp/RtmpCI8d13/2020 xxx Enterprise Key Contact.csv"
read_csv(path)
#> Parsed with column specification:
#> cols(
#> .default = col_character()
#> )
#> See spec(...) for full column specifications.
#> # A tibble: 38 x 113
#> StartDate EndDate Status IPAddress Progress `Duration (in s~ Finished
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 "Start D~ "End D~ "Resp~ "IP Addr~ "Progre~ "Duration (in s~ "Finish~
#> 2 "{\"Impo~ "{\"Im~ "{\"I~ "{\"Impo~ "{\"Imp~ "{\"ImportId\":~ "{\"Imp~
#> 3 "2020-10~ "2020-~ "IP A~ "xxx.~ "100" "308" "True"
#> 4 "2020-10~ "2020-~ "IP A~ "xxx.~ "100" "706" "True"
#> 5 "2020-10~ "2020-~ "IP A~ "xxx.~ "100" "407" "True"
#> 6 "2020-10~ "2020-~ "IP A~ "xxx~ "100" "353" "True"
#> 7 "2020-10~ "2020-~ "IP A~ "xxx~ "100" "245" "True"
#> 8 "2020-10~ "2020-~ "IP A~ "xxx~ "100" "357" "True"
#> 9 "2020-10~ "2020-~ "IP A~ "xxx~ "100" "635" "True"
#> 10 "2020-10~ "2020-~ "IP A~ "xxx.~ "100" "542" "True"
#> # ... with 28 more rows, and 106 more variables: RecordedDate <chr>,
#> # ResponseId <chr>, RecipientLastName <chr>, RecipientFirstName <chr>,
#> # RecipientEmail <chr>, ExternalReference <chr>, LocationLatitude <chr>,
#> # LocationLongitude <chr>, DistributionChannel <chr>,
#> # UserLanguage <chr>, NPS_NPS_GROUP <chr>, NPS <chr>, Q21 <chr>,
#> # Q17 <chr>, Q17_12_TEXT <chr>, Q44_1 <chr>, Q44_2 <chr>, Q44_3 <chr>,
#> # Q44_4 <chr>, Q44_5 <chr>, Q44_6 <chr>, Q44_7 <chr>, Q44_8 <chr>,
#> # Q44_8_TEXT <chr>, Q45_1 <chr>, Q45_2 <chr>, Q45_3 <chr>, Q45_4 <chr>,
#> # Q45_5 <chr>, Q45_6 <chr>, Q45_7 <chr>, Q45_7_TEXT <chr>,
#> # QID138441044 <chr>, Q22 <chr>, Q46 <chr>, Q36_1 <chr>, Q36_2 <chr>,
#> # Q36_3 <chr>, Q36_4 <chr>, Q36_5 <chr>, Q36_6 <chr>, Q36_7 <chr>,
#> # Q36_8 <chr>, Q26_1 <chr>, Q26_2 <chr>, Q26_3 <chr>, Q26_4 <chr>,
#> # Q26_9 <chr>, Q26_10 <chr>, Q26_8 <chr>, Q26_8_TEXT <chr>, Q48_1 <chr>,
#> # Q48_2 <chr>, Q48_3 <chr>, Q48_4 <chr>, Q48_5 <chr>, Q48_6 <chr>,
#> # Q51_1 <chr>, Q51_2 <chr>, Q51_3 <chr>, Q51_4 <chr>, Q51_5 <chr>,
#> # Q53_1 <chr>, Q53_2 <chr>, Q53_3 <chr>, Q53_4 <chr>, Q53_5 <chr>,
#> # Q55_1 <chr>, Q55_2 <chr>, Q55_3 <chr>, Q55_4 <chr>, Q57 <chr>,
#> # Q59 <chr>, Q62 <chr>, Q62_4_TEXT <chr>, Q61 <chr>, Q23 <chr>,
#> # QID138441036 <chr>, Q25 <chr>, Q13 <chr>, Q29 <chr>, Q32 <chr>,
#> # Q56 <chr>, Q30_1 <chr>, Q30_2 <chr>, Q30_3 <chr>, Q30_4 <chr>,
#> # Q30_5 <chr>, Q14_1 <chr>, Q14_2 <chr>, Q14_3 <chr>, Q15 <chr>,
#> # Q34 <chr>, Q35 <chr>, `timing_First Click` <chr>, `timing_Last
#> # Click` <chr>, `timing_Page Submit` <chr>, `timing_Click Count` <chr>,
#> # metaInfo_Browser <chr>, metaInfo_Version <chr>, ...
Created on 2020-11-02 by the reprex package (v0.3.0)
If I have counted right, it looks like the 58th column is Q26_8_TEXT
, with the 59th column Q48_1
. Can you show me a bit (or describe) what is different about the columns to the right/left of Q26_8_TEXT
that is causing them not to be read in when we take off the top rows?
This only reads in 58, you are saying:
readr::read_csv(
file = <YOUR PATH>, col_names = FALSE,
col_types = readr::cols(.default = readr::col_character()),
skip = 3, na = c("")
)
As a workaround for now, have you tried using using the include_questions
argument of fetch_survey()
, to only include questions you need and avoid any that are causing this problem?
Feel free to open a new issue if you run into further questions or problems! 🙌
Warning: 32 parsing failures. row # A tibble: 5 x 5 col row col expected actual file
expected
actual 1 2 NA 11 columns 80 columns col 4 5 NA 11 columns 80 columns
Error in attr(x, "names") <- as.character(value) : 'names' attribute [80] must be the same length as the vector [11]
In rbind(names(probs), probs_f) : number of columns of result is not a multiple of vector length (arg 1)
For example below: Warning: 21 parsing failures. row # A tibble: 5 x 5 col row col expected actual file
expected
actual 1 2 NA 58 columns 109 columns file 2 3 NA 58 columns 109 columns