ropensci / qualtRics

Download ⬇️ Qualtrics survey data directly into R!
https://docs.ropensci.org/qualtRics
Other
215 stars 70 forks source link

re-write readSurvey() #82

Closed JasperHG90 closed 5 years ago

JasperHG90 commented 6 years ago

readSurvey() is creating the majority of issues filed on the qualtRics GitHub page.

I'm not entirely sure what the best way is to solve these issues and guarantee a smooth import process.

Issues occur when:

I don't have access to large-scale survey results, so my ability to test these cases is very limited. Any help in improving the readSurvey() function is very welcome. Please respond to this thread.

ryantsullivan commented 6 years ago

I can provide you with some survey results that I think are pretty large. We have a survey with over 70K responses and it does present some problems at times for the survey reader. Let me know what the best way to go about sharing this information would be.

cpsyctc commented 6 years ago

I am having a failure readSurvey() with quite a small dataset and I'm pretty sure the issue is with embedded carriage returns in line 2 of the file. I can bodge my way around it with read.csv() and my work on the project is subject to an NDA so I am exploring with the contractor whether I can share anything but I would like to help get qualtRics robust if I can. I'm just checking here that the project is active as it's dropped off CRAN (and I see the message about CRAN Emails going into a spam folder). More seriously, as I read the file dates, nothing has been updated here other than messages back and forth here for some months. Questions: 1) Are you still actively fixing bugs here on github so at least I could, perhaps, do a back and forth with you in some way even if I can't share the data? 2) Do you have an expected date to get back on CRAN? I hate to seem unhelpful when, for once, it looks like I as a non-programmer, might be able to help the Rverse, but time is precious and I think getting this sorted might take quite a lot of hours and I don't want to do that if the results won't get through at least to the github version of the package. TIA and thanks for what, by the look of it, could be an invaluable package! Chris

cpsyctc commented 6 years ago

I can see that wasn't a very friendly or helpful contribution from me earlier (bad start to the morning!) Here, without breaching my NDA is the session (from in Rstudio but I don't think that's relevant).

OK, no it's not: why is the paste killing carriage returns from Emacs. OK. Got round that. Now, here is the log of the session with the error. I hadn't seen the missing fansi package request before which is puzzling.

R version 3.5.1 (2018-07-02) -- "Feather Spray"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(devtools)
> devtools::install_github("ropensci/qualtRics")
Downloading GitHub repo ropensci/qualtRics@master
from URL https://api.github.com/repos/ropensci/qualtRics/zipball/master
Installing qualtRics
Installing 1 package: sjlabelled
Installing package into ‘C:/Users/Chris.Evans.RUS/Documents/R/win-library/3.5’
(as ‘lib’ is unspecified)
trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.5/sjlabelled_1.0.13.zip'
Content type 'application/zip' length 263344 bytes (257 KB)
downloaded 257 KB

package ‘sjlabelled’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
    C:\Users\Chris.Evans.RUS\AppData\Local\Temp\RtmpEhy17v\downloaded_packages
"C:/Users/CHRISE~1.RUS/DOCUME~1/R/R-35~1.1/bin/x64/R" --no-site-file --no-environ  \
  --no-save --no-restore --quiet CMD INSTALL  \
  "C:/Users/Chris.Evans.RUS/AppData/Local/Temp/RtmpEhy17v/devtools1f184bff613a/ropensci-qualtRics-a7ddaaa"  \
  --library="C:/Users/Chris.Evans.RUS/Documents/R/win-library/3.5" --install-tests 

* installing *source* package 'qualtRics' ...
** R
** inst
** tests
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
  converting help for package 'qualtRics'
    finding HTML links ... done
    getSurvey                               html  
    getSurveyQuestions                      html  
    getSurveys                              html  
    metadata                                html  
    qualtRicsConfigFile                     html  
    readSurvey                              html  
    registerOptions                         html  
** building package indices
** installing vignettes
** testing if installed package can be loaded
*** arch - i386
*** arch - x64
* DONE (qualtRics)
In R CMD INSTALL
> pathName1 <- "C:\\Users\\Chris.Evans.RUS\\Documents\\D\\Data\\87percent.me\\random_test_data"
> filesNumericList <- list.files(path = pathName1,pattern="*Numeric.csv")
> #install.packages("devtools")
> # devtools::install_github("ropensci/qualtRics")
> library(qualtRics)
> tmpFile1 <- paste0(pathName1,"\\","1. Quality of Life_Numeric-minus_consent.csv")
> tmpDat <- readSurvey(tmpFile1)
Error in loadNamespace(name) : there is no package called ‘fansi’
> install.packages("fansi")
Installing package into ‘C:/Users/Chris.Evans.RUS/Documents/R/win-library/3.5’
(as ‘lib’ is unspecified)
trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.5/fansi_0.3.0.zip'
Content type 'application/zip' length 192509 bytes (187 KB)
downloaded 187 KB

package ‘fansi’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
    C:\Users\Chris.Evans.RUS\AppData\Local\Temp\RtmpEhy17v\downloaded_packages
> tmpDat <- readSurvey(tmpFile1)
Warning: 26 parsing failures.
row # A tibble: 5 x 5 col     row col   expected   actual   file                                                     expected   <int> <chr> <chr>      <chr>    <chr>                                                    actual 1     2 NA    34 columns 70 colu~ "'C:\\Users\\Chris.Evans.RUS\\Documents\\D\\Data\\87per~ file 2     3 NA    34 columns 70 colu~ "'C:\\Users\\Chris.Evans.RUS\\Documents\\D\\Data\\87per~ row 3     4 NA    34 columns 70 colu~ "'C:\\Users\\Chris.Evans.RUS\\Documents\\D\\Data\\87per~ col 4     5 NA    34 columns 70 colu~ "'C:\\Users\\Chris.Evans.RUS\\Documents\\D\\Data\\87per~ expected 5     6 NA    34 columns 70 colu~  [... truncated]
Error in names(rawdata) <- names(header) : 
  'names' attribute [70] must be the same length as the vector [34]
In addition: Warning message:
In rbind(names(probs), probs_f) :
  number of columns of result is not a multiple of vector length (arg 1)
>

Does this suggest we have something that it would be useful to debug and solve if I can get you a csv file without breaking my NDA?

adrian-gadient commented 6 years ago

Part of the problem is that Qualtrics keeps changing their output format (particularly the csv files). We already updated the import function a couple of times according to the changes that Qualtrics made. I'm not sure whether it makes sense to follow this practice.

Instead of adapting the function to Qualtrics' developments, it might be more practical to stick to the legacy formats. I think these should remain pretty consistent. Some time ago I wrote a blog entry that describes how to download and import the legacy format (http://adrianbruegger.com/import-qualtrics-csv-files/). Maybe this approach will also solve some of the problems described here.

Another suggestion is to use the output files that Qualtrics generates for SPSS (at the moment, the function is based on the csv output). Maybe changing the import function and reading the sav files is less error prone.

Scott123456 commented 6 years ago

I see that readSurvey() is also being used getSurvey(). If the majority of problems are from using readSurvey() with download CSV files, I would propose getSurvey() use it's own code-base for parsing data that is specific to what is returned by the API. This way getSurvey() can remain simple, and readSurvey() can be re-written and become more complicated.

juliasilge commented 5 years ago

Hello there, all! 👋 I am the new maintainer of this package, and I'd love to get to the bottom of some of these issues with read_survey(). If you are able, can you re-install from GitHub here, run the following code, and then email me (my address is in the DESCRIPTION file) the CSV file at the path you find? It will be a fairly raw, unprocessed CSV file, but that is the one that is causing problems, apparently!

root_url <- qualtRics:::append_root_url(Sys.getenv("QUALTRICS_BASE_URL"), "responseexports")
raw_payload <- qualtRics:::create_raw_payload(
    surveyID = <YOUR SURVEY ID HERE>,
    label = TRUE,
    last_response = NULL,
    start_date = NULL,
    end_date = NULL,
    unanswer_recode = NULL,
    limit = NULL,
    local_time = FALSE,
    include_questions = NULL
  )
res <- qualtRics:::qualtrics_api_request("POST", url = root_url, body = raw_payload)
ID <- res$result$id
survey.fpath <- qualtRics:::download_qualtrics_export(paste0(root_url, ID), verbose = TRUE)
survey.fpath
cpsyctc commented 5 years ago

It's great that you've taken up maintaining the package. The particular piece of work I was doing that caused the problems I was having is long gone and was one of my very few paid things and the only one under a confidentiality clause so I don't think I can do this. I'm trying to use my own installation of LimeSurvey to replace any occasion on which I might have used Qualtrics and I don't have any active work using it at the moment. I do still have access to it but I am absurdly overworked at the moment so I'm hoping that others have given you what you needed around this bit of debugging. If that's not the case, come back to me and, if I can find the time, I'll see what I might be able to do.

Good luck with your work though: as I say, it's great that you're doing it. I suspect it's a package that, if tidied up a bit and maintained, would get a lot of use. I hope you get collaboration from Qualtrics too.

Best wishes,

Chris

From: "Julia Silge" notifications@github.com To: "ropensci/qualtRics" qualtRics@noreply.github.com Cc: "Chris Evans" chris@psyctc.org, "Comment" comment@noreply.github.com Sent: Tuesday, 26 March, 2019 19:33:22 Subject: Re: [ropensci/qualtRics] re-write readSurvey() (#82)

Hello there, all! \uD83D\uDC4B I am the new maintainer of this package, and I'd love to get to the bottom of some of these issues with read_survey() . If you are able, can you re-install from GitHub here, run the following code, and then email me (my address is in the DESCRIPTION file) the CSV file at the path you find? It will be a fairly raw, unprocessed CSV file, but that is the one that is causing problems, apparently! root_url <- qualtRics:::append_root_url(Sys.getenv("QUALTRICS_BASE_URL"), "responseexports")

raw_payload <- qualtRics:::create_raw_payload(

surveyID = <YOUR SURVEY ID HERE>,

label = TRUE,

last_response = NULL,

start_date = NULL,

end_date = NULL,

unanswer_recode = NULL,

limit = NULL,

local_time = FALSE,

include_questions = NULL

)

res <- qualtRics:::qualtrics_api_request("POST", url = root_url, body = raw_payload)

ID <- res$result$id

survey.fpath <- qualtRics:::download_qualtrics_export(paste0(root_url, ID), verbose = TRUE)

survey.fpath

— You are receiving this because you commented. Reply to this email directly, [ https://github.com/ropensci/qualtRics/issues/82#issuecomment-476789422 | view it on GitHub ] , or [ https://github.com/notifications/unsubscribe-auth/AJ-xOm-b2FfoQ4f3yjU12OZCxE2KkByhks5vamfygaJpZM4UMlhe | mute the thread ] .

-- Chris Evans chris@psyctc.org Skype: chris-psyctc Visiting Professor, University of Sheffield chris.evans@sheffield.ac.uk I do some consultation work for the University of Roehampton chris.evans@roehampton.ac.uk and other places but this chris@psyctc.org remains my main Email address. I have "semigrated" to France, see: https://www.psyctc.org/pelerinage2016/semigrating-to-france/ if you want to book to talk, I am trying to keep that to Thursdays and my diary is now available at: https://www.psyctc.org/pelerinage2016/ecwd_calendar/calendar/ Beware: French time, generally an hour ahead of UK. That page will also take you to my blog which started with earlier joys in France and Spain!

juliasilge commented 5 years ago

Thanks @cpsyctc! That all absolutely makes sense. Probably the most sensible course at this point is to re-release to CRAN and see what problems people are having now. 😁

cpsyctc commented 5 years ago

Sounds good to me. As I say, if you find yourself desperate for testers, put me on a list to call and if I can at the time, I will.

Good luck with it: appreciated.

Chris

From: "Julia Silge" notifications@github.com To: "ropensci/qualtRics" qualtRics@noreply.github.com Cc: "Chris Evans" chris@psyctc.org, "Mention" mention@noreply.github.com Sent: Friday, 29 March, 2019 19:14:51 Subject: Re: [ropensci/qualtRics] re-write readSurvey() (#82)

Thanks [ https://github.com/cpsyctc | @cpsyctc ] ! That all absolutely makes sense. Probably the most sensible course at this point is to re-release to CRAN and see what problems people are having now . \uD83D\uDE01

— You are receiving this because you were mentioned. Reply to this email directly, [ https://github.com/ropensci/qualtRics/issues/82#issuecomment-478098754 | view it on GitHub ] , or [ https://github.com/notifications/unsubscribe-auth/AJ-xOq515kqlC8MWZf_pUaGf9_AGIcZ6ks5vblgbgaJpZM4UMlhe | mute the thread ] .

-- Chris Evans chris@psyctc.org Skype: chris-psyctc Visiting Professor, University of Sheffield chris.evans@sheffield.ac.uk I do some consultation work for the University of Roehampton chris.evans@roehampton.ac.uk and other places but this chris@psyctc.org remains my main Email address. I have "semigrated" to France, see: https://www.psyctc.org/pelerinage2016/semigrating-to-france/ if you want to book to talk, I am trying to keep that to Thursdays and my diary is now available at: https://www.psyctc.org/pelerinage2016/ecwd_calendar/calendar/ Beware: French time, generally an hour ahead of UK. That page will also take you to my blog which started with earlier joys in France and Spain!

juliasilge commented 5 years ago

Thanks so much to all for your thoughts here! I'm going to close this older issue. If you use qualtRics again and have problems, please do open a new issue.