ropensci / qualtRics

Download ⬇️ Qualtrics survey data directly into R!
https://docs.ropensci.org/qualtRics
Other
215 stars 70 forks source link

Parsing issues when downloading from Qualtrics #119

Closed flavioazevedo closed 5 years ago

flavioazevedo commented 5 years ago

I am having a parsing issue while using your package QualtRics. Basically, it shows anywhere from 500 parsing issues to 6000 (on other datas I tried to download from the Qualtrics website via your R package. I would love if this was just me being stupid, but before contacting you I have tried to solve for 2 days already. Below you find the error I am given. I am using R 3.6 and latest RStudio. I replicated results in base R, just in case it was something related to RStudio (had that before). I am installing the qualtRics directly from your GitHub repository.

Hopefully, you have a solution. Thank you so much for your work!

mysurvey <- qualtRics::fetch_survey(

  • surveyID = "xxxxxxxxxxxxxx",
  • label = FALSE,

  • force_request = TRUE,
  • verbose = TRUE) |====================================================================================================================================================| 100% 500 parsing failures. row col expected actual file 1135 X962 1/0/T/F/TRUE/FALSE I know they identify as alt-right They know I identify as alt-right 'C:/Users/falaf/AppData/Local/Temp/Rtmp8sSTIN/[CINT] Midterms Pre-election.csv' 1135 X963 1/0/T/F/TRUE/FALSE I know they identify as alt-right They don't know I identify as alt-right 'C:/Users/falaf/AppData/Local/Temp/Rtmp8sSTIN/[CINT] Midterms Pre-election.csv' 1135 X964 1/0/T/F/TRUE/FALSE I don't know if they identify as alt-right They know I identify as alt-right 'C:/Users/falaf/AppData/Local/Temp/Rtmp8sSTIN/[CINT] Midterms Pre-election.csv' 1135 X1243 1/0/T/F/TRUE/FALSE I know they are a member of Antifa They know I am a member of Antifa 'C:/Users/falaf/AppData/Local/Temp/Rtmp8sSTIN/[CINT] Midterms Pre-election.csv' 1135 X1244 1/0/T/F/TRUE/FALSE I don't know if they are a member of Antifa They know I am a member of Antifa 'C:/Users/falaf/AppData/Local/Temp/Rtmp8sSTIN/[CINT] Midterms Pre-election.csv' .... ..... .................. ............................................................................. ............................................................................... See problems(...) for more details. Duplicated column names deduplicated: 'Q_text' => 'Q_text_1' [883]5157 parsing failures. row col expected actual 16 -- value in level set 25-34 18 -- value in level set 25-34 20 -- value in level set 25-34 21 -- value in level set 25-34 25 -- value in level set 45-54 ... ... .................. ...... See problems(...) for more details. 6783 parsing failures. row col expected actual 16 -- value in level set Bachelor
    18 -- value in level set Bachelor
    20 -- value in level set Some college 21 -- value in level set High school 25 -- value in level set High school ... ... .................. ............ See problems(...) for more details. 710 parsing failures. row col expected actual 37 -- value in level set Less $15,000 42 -- value in level set Less $15,000 61 -- value in level set Less $15,000 67 -- value in level set Less $15,000 85 -- value in level set Less $15,000 ... ... .................. ............ See problems(...) for more details. 4542 parsing failures. row col expected actual 18 -- value in level set The Bible is the actual word of God 21 -- value in level set The Bible is the actual word of God 25 -- value in level set May have some errors
    26 -- value in level set The Bible is the actual word of God 28 -- value in level set May have some errors
    ... ... .................. ................................... See problems(...) for more details. 1713 parsing failures. row col expected actual 21 -- value in level set Independent 25 -- value in level set Independent 29 -- value in level set Independent 31 -- value in level set Independent 37 -- value in level set Independent ... ... .................. ........... See problems(...) for more details. 1738 parsing failures. row col expected actual 28 -- value in level set Not very strong Rep 30 -- value in level set Not very strong Rep 32 -- value in level set Strong Rep
    35 -- value in level set Strong Rep
    36 -- value in level set Strong Rep
    ... ... .................. ................... See problems(...) for more details. 2048 parsing failures. row col expected actual 16 -- value in level set Strong Demo
    18 -- value in level set Strong Demo
    20 -- value in level set Not very strong Demo 26 -- value in level set Strong Demo
    27 -- value in level set Strong Demo
    ... ... .................. .................... See problems(...) for more details. 1438 parsing failures. row col expected actual 21 -- value in level set Republican 25 -- value in level set Democrat
    29 -- value in level set Neither
    31 -- value in level set Neither
    37 -- value in level set Democrat
    ... ... .................. .......... See problems(...) for more details. 259 parsing failures. row col expected actual 42 -- value in level set Do not know 53 -- value in level set Republican 123 -- value in level set Democrat
    234 -- value in level set Do not know 246 -- value in level set Do not know ... ... .................. ........... See problems(...) for more details. 5290 parsing failures. row col expected actual 16 -- value in level set Never consider
    18 -- value in level set Never consider
    20 -- value in level set Often consider
    21 -- value in level set Never consider
    25 -- value in level set Often vote another party ... ... .................. ........................ See problems(...) for more details. 3646 parsing failures. row col expected actual 16 -- value in level set Barak Obama 18 -- value in level set Barak Obama 20 -- value in level set Barak Obama 26 -- value in level set Barak Obama 28 -- value in level set Do not know ... ... .................. ........... See problems(...) for more details. 3636 parsing failures. row col expected actual 16 -- value in level set Barak Obama 18 -- value in level set Barak Obama 20 -- value in level set Do not know 21 -- value in level set Barak Obama 26 -- value in level set Barak Obama ... ... .................. ........... See problems(...) for more details. 5185 parsing failures. row col expected actual 16 -- value in level set D. Trump
    18 -- value in level set D. Trump
    20 -- value in level set B. Sanders 21 -- value in level set D. Trump
    25 -- value in level set B. Sanders ... ... .................. .......... See problems(...) for more details. 54 parsing failures. row col expected actual 20 -- value in level set This is correct 47 -- value in level set This is correct 107 -- value in level set This is correct 943 -- value in level set This is correct 976 -- value in level set This is correct ... ... .................. ............... See problems(...) for more details. 4493 parsing failures. row col expected actual 16 -- value in level set Permit other reasons 18 -- value in level set Permit other reasons 20 -- value in level set Always abortion
    21 -- value in level set Permit some cases
    25 -- value in level set Always abortion
    ... ... .................. .................... See problems(...) for more details.
juliasilge commented 5 years ago

I am so sorry for the frustration you have experienced @flavioazevedo! Let's see if we can get to the bottom of it.

If you are open to this, can you run the following code and then email me (my address is in the DESCRIPTION file) the CSV file at the path you find? It will be a fairly raw, unprocessed CSV file, but that is the one that is causing problems with readr::read_csv(), apparently!

root_url <- qualtRics:::append_root_url(Sys.getenv("QUALTRICS_BASE_URL"), "responseexports")
raw_payload <- qualtRics:::create_raw_payload(
    surveyID = <YOUR SURVEY ID HERE>,
    label = TRUE,
    last_response = NULL,
    start_date = NULL,
    end_date = NULL,
    unanswer_recode = NULL,
    limit = NULL,
    local_time = FALSE,
    include_questions = NULL
  )
res <- qualtRics:::qualtrics_api_request("POST", url = root_url, body = raw_payload)
ID <- res$result$id
survey.fpath <- qualtRics:::download_qualtrics_export(paste0(root_url, ID), verbose = TRUE)
survey.fpath

What happens if you use convert = FALSE in fetch_survey()?

The other thing I would love to know is if the output you get after the parsing failures is actually garbled. What happens here is that R sends a request to the Qualtrics API and the API sends back a zipped CSV file. Then R unzips the CSV file and reads it using readr::read_csv(). There are in fact sometimes parsing failures, because the CSVs are sometimes malformed or having something weird in them. The good thing about readr::read_csv() is that is does the best it can and usually returns something pretty usable, and then there is code in the qualtRics package to clean up the results. You have experienced these parsing failures, but is the output in fact not sensible?

pinusm commented 5 years ago

I'm also seeing this.. I can report that, at least in my case, the affected columns are all NA, when I can see on the Qualtrics website that in the original data only ~50% had missing values.

I ran your code, and I can see the data just fine. Unfortunately, I'm unable to send you any data, for ethical reasons.

Any help would be greatly appreciated. I haven't used this code in a long time and I don't recall ever having this issue in previous versions (of my code, or the package).. Thanks, Michael

pinusm commented 5 years ago

Also, when manually downloading the data as CSV and using read_survey(), all seems OK.

Lingtax commented 5 years ago

I've also got this issue today. It appears to uniquely affect items where one of the options is "Not listed (please specify)" with an associated text field as the next variable. I'll have to check my ethical approval, but I may be able to share the csv file (or at least a portion of it). Cheers, Mathew

juliasilge commented 5 years ago

I made some changes this evening that should make data import more robust. If you get a chance, reinstall this package from GitHub and see if this solves some of the parsing issues you have been seeing:

install.packages("remotes")
remotes::install_github("ropensci/qualtRics")

Thanks so much! 🙌

Lingtax commented 5 years ago

Unfortunately it's still breaking in the same way, Julia. If it will help, I can send you extracts of the data, I will just have to excise some variables to minimise identifiability risks. Just let me know what you need and in what formats.

juliasilge commented 5 years ago

😭 😭 😭

Well, let's try some other things. If you are able, can you run the following code and then email me (my address is in the DESCRIPTION file) the CSV file at the path you find? It will be a fairly raw, unprocessed CSV file, but that is the one that is causing problems with data import.

root_url <- qualtRics:::append_root_url(Sys.getenv("QUALTRICS_BASE_URL"), "responseexports")
raw_payload <- qualtRics:::create_raw_payload(
    surveyID = <YOUR SURVEY ID HERE>,
    label = TRUE,
    last_response = NULL,
    start_date = NULL,
    end_date = NULL,
    unanswer_recode = NULL,
    limit = NULL,
    local_time = FALSE,
    include_questions = NULL
  )
res <- qualtRics:::qualtrics_api_request("POST", url = root_url, body = raw_payload)
ID <- res$result$id
survey.fpath <- qualtRics:::download_qualtrics_export(paste0(root_url, ID), verbose = TRUE)
survey.fpath
juliasilge commented 5 years ago

Looks like this issue is now resolved. To do the conversion to factors, we need the labels from Qualtrics. This means that label = FALSE needs to be used with convert = FALSE.

fetch_survey(my_id, label = FALSE, convert = FALSE)

I've added some new condition checking/error messaging.

If you continue to have problems, open up a new issue with the details!