Qualtrics sends multiple 'Status' columns via api/export; chaos ensues

chrisumphlett commented 3 years ago

This is NOT an issue with the package; but perhaps the package can help provide a solution. (I am opening a ticket with Qualtrics support about this as well).

When I fetch surveys I separate columns that I consider to be metadata (survey fields, embedded data, contact fields) from those that are responses to questions. For better or worse, I had an assumption that after doing that, responseid would be the first column remaining in the response set. Well... that was until multiple Status columns started appearing. fetch_surveys gives them the names Status...X where X is the column position. Since I was specifying Status as the metadata column name to drop, neither is dropped and Status...3 ends up in the first position.

This only happens on a handful of surveys, out of 100s.

Anyone seen this before?

Workaround suggestions? I can probably assume that status...3 be renamed to Status, it will always be in that position. The 2nd instance of Status' position will depend on how many columns there are. But, since assuming things about columns got me in this mess in the first place I'd rather a better solution than making those assumptions.

Anything that could be done in the package to mitigate this?

(I get 2 status columns when I a manual export to excel from Qualtrics, hence why I do not believe this is an issue with the package or API specifically).

jmobrien commented 3 years ago

Multiple Status columns is new to me--can you explain in more detail? Also, can you copy the actual setup you're using to do the separation you describe?

chrisumphlett commented 3 years ago

I'll try to explain/show. This snag shows the export directly from Qualtrics with two status columns. The 1st one has a name of "response type" (2nd row) and has the value; the 2nd one is empty. https://www.screencast.com/t/3mSLmTD3b

The separation

Response data
Metadata

jmobrien commented 3 years ago

Ah, I actually think I have seen something like this before. Did someone manually create an embedded data variable called "Status" in that survey, or similar?

I think the fix for this is as easy as adding name_repair = "minimal" to the readr::read_csv() call inside read_survey That would prevent the ..X extensions being added.

It might make sense to do that, so we don't generate non-qualtrics-defined variable names. It would just relocate your problem, though--you'd just end up with two "Status" variables in the resulting data frame.

chrisumphlett commented 3 years ago

Not an extra variable, at least not in this example I'm looking at right now. I did figure out what it is -- response type, which is a standard Qualtrics survey metadata field, comes out with Status as the column name and Response type as the column label (1st and 2nd row, respectively, when I export directly to excel). Response type is a standard field-- with values like IP Address. Status is not a standard field I guess, I looked in a couple other surveys and it wasn't there.

As far as the solution... I agree, Status twice might be a worse problem. I'd prefer "Status" and "StatusX" where one of them has the original name any subsequent columns do not. If I did have two Status columns I could probably do some post-processing to do that renaming before the keep/select process.

jmobrien commented 3 years ago

So, I'm mostly tracking you, but I'm not sure. Here's what I'm used to seeing:

The variable (column) name (user-specified names for actual Q's. Here, standard survey metadata names)
A description of that variable (often the item text. Here, a standard description of the survey metadata)
Variable/column metadata (often the Qualtrics-internal variable, i.e. QID. Here, that + some other stuff)

Column/variable Status has description Response Type, and column metadata {importid:"status"}. That all feels pretty normal. But I don't see how we get from there to your issue where read_csv() saw multiple columns called Status that it then tried to differentiate.

https://www.screencast.com/t/3mSLmTD3b

Column AJ here has the same variable/column name (row 1) and description (row 2). Generally, that's what you get with an embedded data variable. The other columns nearby AJ look the same, and embedded data generally shows up as some of the last columns (displayorder vars come after if present).

Are you sure your survey has no embedded data called Status? (That exact thing has happened to me before, with a similar situation, so I know it's at least possible).

chrisumphlett commented 3 years ago

Figured out the problem. We didn't have embedded data called status in the place that I'm used to seeing it (it wasn't available in the "Data and Analysis" tab), but it was a field name being used in the survey flow. Our research person looked in and found it in the survey flow after Qualtrics support suggested this could be the issue.

For closed surveys, they are just deleting the field.

For surveys that are active, we're not sure what we're going to do. I may still need the workaround. I'm hoping we can just take care of all of it at the source, and then in the future will try to avoid using that as a survey flow field. I suggested to Qualtrics that they create "reserved field names" so that something like status (or responseid, or startdate, etc) wouldn't be allowed for embedded data/survey flow/question IDs. Maybe that's a PITA for other customers, it would help me :)

jmobrien commented 3 years ago

Gotcha. For now, I can't think of a reason that the actual metadata column wouldn't always come first. So, maybe you could just add Status..1 to your list of possible metadata variables, and then rename it back to Status if present?

On Tue, Oct 5, 2021 at 3:56 PM Chris Umphlett @.***> wrote:

Figured out the problem. We didn't have embedded data called status per se, but it was a field name being used in the survey flow. So I didn't see it in the "Data and Analysis" tab, but our research person looked in and found it in the survey flow.

For closed surveys, they are just deleting the field.

For surveys that are active, we're not sure what we're going to do. I may still need the workaround. I'm hoping we can just take care of all of it at the source, and then in the future will try to avoid using that as a survey flow field. I suggested to Qualtrics that they create "reserved field names" so that something like status (or responseid, or startdate, etc) wouldn't be allowed for embedded data/survey flow/question IDs. Maybe that's a PITA for other customers, it would help me :)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ropensci/qualtRics/issues/233#issuecomment-934822205, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABWEMIVLBG36GQA2UJEYQWLUFNRBXANCNFSM5FCPXGMA .

jmobrien commented 3 years ago

@chrisumphlett As far as the Data and Analysis tab--I think variables (sometimes?) don't show up there if they are completely empty, which can happen for a rogue field in one's survey flow. I've encountered this problem before as well. If you want to set up a more direct solution all your surveys, the flow specification for each survey comes with fetch_description(), though for now it comes in only a minimally-processed form.

chrisumphlett commented 3 years ago

Thanks. Several suggestions I can consider in the future.

ropensci / qualtRics

Qualtrics sends multiple 'Status' columns via api/export; chaos ensues #233