ropensci / qualtRics

Download ⬇️ Qualtrics survey data directly into R!
https://docs.ropensci.org/qualtRics
Other
215 stars 70 forks source link

Error in `export_responses_filedownload()`: caused by "/" in Qualtrics Project Name #307

Open asadow opened 1 year ago

asadow commented 1 year ago

I discovered through trial and error that the following error is fixed when a "/" in a project name is removed.

In my case my survey called "Grounds Shovel Route Afternoon Shift / Weekend Priorities Checklist". Renaming to "Grounds Shovel Route Afternoon Shift - Weekend Priorities Checklist" fixed the error.

May an error message be implemented that explains the incompatibility of the slash to the user?

Error in export_responses_filedownload(): ! Error extracting CSV from zip file • The download may have been corrupted; try re-running your query • Current download file location: • C:\Users\asadowsk\AppData\Local\Temp\RtmpOSLRWg/temp.zip

juliasilge commented 1 year ago

This makes sense that it would error, because the Qualtrics project name is used as a filename when you download via the API. This is related to #195 and the fix suggested there (using something like fs::path_sanitize()) would probably solve both sets of problems and be safer overall.

If someone is interested in submitting a PR implementing a fix in the near term, I will be happy to review! I believe the changes need to be in this function: https://github.com/ropensci/qualtRics/blob/2d0fb411eb9f0e725983b8b55f1ab5c89055d1b3/R/utils.R#L536 The idea would be to sanitize the path either before or after unzipping. Is it possible to do this without making a copy of the file? Probably.

jmobrien commented 1 year ago

Hmm, this might be harder than we think. As currently configured, the zipped file always has the unproblematic name temp.zip. The problem seems to be with what comes out. From the api docs:

The compressed file inside the retrieved ZIP file has this naming convention: {Survey Project Name}.{Export Format}. Note that the file does not return time and date of export. For date and time information, you need to record this information manually during the export process.

There appears to be no way to change that via the API--and there also doesn't appear to be any tools for on-the-fly renaming during extraction using the utils::unzip() function we're currently relying on. So, I'm not sure where we could actually apply a correcting tool like fs::path_sanitize().

I suppose it might be possible to bypass this? If we use the list = TRUE arg from unzip, we could just get the file name of the internal .csv without attempting to write it to disk. Then, we could maybe make a direct connection to it with base::unz(), passing that directly to read_csv() and loading the data into memory.

I have no idea, though, whether an attempt to link to an internal file like that would still trigger any filesystem naming issues. I also don't have a Windows box on hand to test this case easily.

@asadow, I suppose I could make something for you to test if you're up for it?

jmobrien commented 1 year ago

@asadow, I've got something that might work. If you want, you can try temporarily installing from the recent commit in the draft PR using remotes::install_github("ropensci/qualtRics", ref = "463ec76"), and see if it works with scenarios that previously were erroring out.

asadow commented 1 year ago

@jmobrien

It worked.

|==========================================================================================| 100%

Internal filename is: Grounds Shovel Route Afternoon Shift / Weekend Priorities Checklist.csv

jmobrien commented 1 year ago

Okay. Thoughts @juliasilge? Any other relevant scenarios we might need to test?

The implementation in the draft PR was just hacked in for speed. If we actually move away from pre-extracting from the .zip file will require a redesign around the current export_responses_* set of functions.