Open context-dependent opened 2 years ago
I'm assuming you were using convert = TRUE
(as is the default?). And when you say this:
though the data are present in the untreated csv export.
Are you referring to the manual download from the webpage, or something else?
Also, did the changes to the question satis_overall
occur before or after the survey was initially published (whether or not it had responses yet)?
I'm thinking this could relate several things, but one main underlying problem is that convert = TRUE
is still looking at the older survey description endpoint, which may be the reason why you're experiencing issues.
@juliasilge, I think this is pointing again to the need for the work around #267.
@jmobrien this all adds up to me.
The changes to satis_overall happened after publication, and the factor levels seem to point in the direction of the description endpoint.
I was referring to the csv exported from the GUI. I just tried fetch_survey with convert = FALSE and the result is the same as reading in the csv.
@juliasilge what would you think about making convert = FALSE
the default, at least until we can get things off the v2 endpoint? I know conversion has been the default behavior for a long time, but I'm concerned we might be steering users, maybe esp. new ones, towards potentially problematic behavior.
As @context-dependent noted, that change would just mean that users by default get the same data they get from a web download with default settings (still with some bonuses like cleaning up the metadata row & a column map)
@jmobrien changing the default behaviour to convert = FALSE
is an elegant and straightforward solution, it could break extant data pipelines, but I'm not sure if that's a dealbreaker.
The short-term options, imo are:
convert = FALSE
As you suggest, this would give new users data that meets their expectations by default, but may break code that currently relies on the present default behaviour.
This would be a bigger lift, but I think still manageable in the short term. The nugget is getting wrapper_mc()
to detect parsing failures. Once that is done, it could then raise a warning, skip conversion, change its behaviour, or some combination of the above.
Interested in thoughts, happy to help however I can.
I'm hesitant to change the default behavior for something that has worked the same way for such a long time (predating my own involvement in this package); that is a type of change that you can't really inform users about, apart from a message every single time they use the function (folks hate that). I think it likely would be better to change over to the new endpoint instead, for example supporting the includeLabelColumns
option. We can open an issue to track input on changing to convert = FALSE
but I think it would be better to update the underlying API call.
@context-dependent Thanks for your offer of help. It feels like the real fixes here are intertwined with a lot of other factors, so I'd be reluctant to have you commit much work to something that's likely to get changed.
But, if you're up for it, adding some kind of check + warning for parsing failures might be a great stopgap measure. Basically, where if wrapper_mc()
detects elements in some variable not in the list of expected levels obtained from the survey_questions()
call, you throw a warning that reports the problem variable(s) and any unmatched elements.
(@juliasilge thoughts? Along w/warning, should we also tack on any unmatched elements to the vector of factor levels? Simple warnings an easy way to preserve existing behavior for now--but, basic users would have a hard time fixing anything after a warning (b/c all non-matched values are now NA
s). Appending the levels means that users would need to rearrange, but that's at least possible w/o going back to the web or redownloading.
That sounds like a reasonable stopgap measure to me, and I would probably lean toward doing a simple warning to start with (with a recommendation to redownload with convert = FALSE
?). We could extend that to some fixing/handling next.
That's probably a good balance, yes.
@context-dependent is this something you'd like to tackle? If so, great, and let us know if you have any questions. (If useful, you can see several recent use examples for the rlang::warn()
warning tool in /R/Checks.R.)
Thanks! I would be happy to give it a shot next week if that works.
Problem
Recently found several instances of this behaviour. I'm not sure exactly what's going on, but I have some guesses. Below, I compare
example_responses_Rics <- fetch_survey()
toexample_responses_csv <- read_csv()
to illustrate.In the survey, the question labeled
satis_overall
began with a set of response options that were very slightly altered. The Rics export fails to parse these response options, though the data are present in the untreated csv export.Starting with the following:
Counting the column produces
While doing the same for the csv export
Shows the updated response uptions, but as unordered character values.
In the Rics export, the column is a factor, but its levels are outdated (they have a space that was deleted).
Thoughts on solution
fetch_survey()
uses Qx's metadata to parse factors. The response options metadata appears not to update with changes to the survey, so I don't see a clear path to solving this problem while maintaining both the factor parsing functionality and the fast metadata * csv approach.It feels like it probably shouldn't fail silently, as in many use cases the qualtRics user and the survey maintainer are not the same person.
The .sav export is bigger than the csv, but it bakes in updated labels that can be converted into factor levels. To me, this adds support for parameterizing the export format in
fetch_survey()
, though I'm not sure where the zeitgeist is at on that concept.