ideas: setting up more flexible handling of start and end date params

jmobrien commented 4 years ago

Right now, the start_date and end_date time params for fetch_survey just take a simple format of dates per the documentation, but not more precise times. A more flexible approach to accepting dates/times would help, because with the new V3 fetch endpoint we no longer have the ability to pass a response ID to last_response to continue fetching where we left off. If we could instead pass the last recordedDate entry from a previous fetch, though, we could probably allow something that mimics this functionality.

However, we'd have to think about flexibility

if they just give a string date?
string date + time?
What about the format given as output from fetching currently, set by readr tools?
How do we format timeZones, and consider interactions with the API parameter useLocal?

Right now, date markers gets implemented in the building of the json payload sent with the fetch request. Currently accepts only a simply formatted date, not a time, but under the hood adds a blank 00:00:00 time with a "T" to fit the required format: https://api.qualtrics.com/docs/dates-and-times

Played around with some of the base tools when testing #155 to add flexibility, but didn't see what was best. So far just kept the prior way it was done.

Options:

strptime and similar was the obvious initial choice. They do the correct ISO 8601 format, but without the "T" separator. This can be pretty inflexible, though, as you have to explicitly know the format going in.
Also, issues with strptime and similar around time zone handling:
- wasn't clear to me how to get base tools to implement the time-zone adjustments in a way that was retained in output.
- The string output for timezones from readr (e.g., "CDT" for some data I have in central daylight), doesn't seem to be directly readable by the base tools, as they want the "official" version ("America/Chicago" here)
still, if we can get this mostly working to get ISO8601, we could probably just do a character replace to add the "T".
lubridate is probably the most obvious next choice, as it has some flexible date parsers and is linked to the tidyverse universe. But I haven't tested it, esp. whether it does the "T" formatting. Adds a dependency, of course. But if it gets close, it might work well.

jmobrien commented 4 years ago

Update: strftime rather than strptime is more versatile and gets closer to what we want, but not quite:

Sys.time() # Same as style from readr
strftime(Sys.time(), format = "%FT%H:%M:%S%z") # works, except there should be a ":" two spots before the end.
strftime(as.character(Sys.time()), format = "%FT%H:%M:%S%z") # time zone is not included

Still not sure what we actually SHOULD send as far as time zones, but this at least is closer to control over what we DO send.

also note: output must be character. toJSON() also parses dates, but does it incorrectly for Qualtrics.

chrisumphlett commented 3 years ago

What is the intent for the start and end date? According to the docs, "Filter to only exports responses recorded after (before) the specified date." That would make me think it was based on the recorded date in Qualtrics.

I ask because I just ran into an issue where some partial survey responses from 2/14-2/19 were closed on 2/22. I pull weekly data, Saturday to Friday. So the ones that were fully completed by 2/19 I retrieved on 2/21. But on 2/28, the ones that had a recorded date of 2/22 did not. See example below:

This seems to imply that start/end date parameters are looking at the Start/End date fields in Qualtrics, not the recorded date field.

jmobrien commented 3 years ago

Thanks for the info. So, am I understanding correctly that you're saying you:

requested responses on 2/21 with start/end parameters spanning Saturday 2/14 (2/13?) to Friday 2/19
again on 2/28 with start/end of 2/20-2/26.
You closed some earlier responses on 2/22, but the 2/28 download didn't include them?

If that's correct, then yeah, something seems off. But I'm not sure where. The documentation here matches how the associated parameters startDate and endDate in the API endpoint are described:

https://api.qualtrics.com/guides/reference/responseImportsExports.json/paths/~1surveys~1%7BsurveyId%7D~1export-responses/post

Is there anything else you could figure out from trying downloads with a few different values in start/end?

chrisumphlett commented 3 years ago

TLDR I can't reproduce the issue now. I can look and verify that what I described did happen - I have thousands of response records pulled on 2/28, but none from that survey; and I then had to get them manually the next day. (I loop through a list of surveys each week).

Unrelated: I had figured this out before and adjusted for it, but the docs for this package should note what I was just reminded of at the qualtrics docs: that start date is inclusive, but end date is exclusive. The way it's worded makes it sounds like both are the same.

On 2/21 I asked for responses from 2/13 - 2/19 -- good catch on the date. And then yes, on 2/28 for 2/20 - 2/26.

This was the first time our org used distributions in Qualtrics. The person setting it up had them expire after 5 days (emailed on Monday), but didn't realize that the partially completed surveys would still sit out there for 7 days or whatever to get closed. So she closed them manually on 2/22, and therefore the recorded date was 2/22. (Now, she has it set up so that they close within 24 hours).

Agreed on the documentation. Poorly named by Qualtrics IMHO-- if a parameter is called end date, it should mean the same thing as the column called end date. Not, "Only export responses recorded before the specified date."

If it's really based on recorded date, they should have showed up the 2nd week. If it's based on actual start/end date, then it should have come in the first week. What if the start date definition is wrong? Instead of "Only export responses recorded after the specified date." --> "Only export responses started after the specified date."?

So I went back and this time, if I use 2/21 as the start, and 2/23 as the end. Now, I get those responses. What happened the first time? Idk. Maybe some other kind of delay in Qualtrics having those responses be available? That would still be an issue but not one that I'm going to be able to define.

jmobrien commented 3 years ago

Good to know, thanks. I agree it looks like some kind of server side quirk, but I don't personally have any ideas either.

Regarding the inclusive/exclusive thing, my guess is that's because the API parameters aren't actually "dates" either, but in fact ISO_8601 date/time . Given that we're not yet taking time here, it's defaulting to 00:00, which produces the inclusive/exclusive behavior you see. More flexible date handling would be a useful feature.

juliasilge commented 2 years ago

Closed in #263 thanks to @jmobrien! 🙌

ropensci / qualtRics

ideas: setting up more flexible handling of start and end date params #158