Feature request - detailed list of who bounced/opted out/etc

chrisumphlett commented 3 years ago

I just used fetch_distributions and fetch_mailinglist for the first time. I'm interested in getting invitee-level info on the things that are returned from fetch_distributions like stats_bounced and stats_blocked. I do not find Qualtrics documentation very easy to read, since mailinglist returns an element for each invitee I'm thinking that's probably where to look (even though distribution-level stats are very useful for me right now).

This link is for "List Bounced Contacts in a Mailing List." I'd think there would be something where you could get a boolean flag for each of the stats from fetch_distributions but I didn't see that.

There's a distribution history endpoint with contact level data but this returns sent/open/completed but not bounced/blocked.

dsen6644 commented 3 years ago

all_mailinglists, fetch_mailinglist, and fetch_distributions were developed first because those endpoints are supported for basic level licenses (i.e research core) and would work for all users.

It would be great to expand their functionality to include distribution and mailing list history, distribution links and expiration dates, etc. but would require developers to have access to a license with XM Directory.

This is something I have wanted to do for a while, but unfortunately, my account does not have XM Directory. As a result, when I try to call those APIs endpoints linked above I get 404 errors.

@juliasilge, you mentioned we have a test account that we use for development, does it have XM Directory? If not, do you think we should contact Qualtrics and see if they have any interest in upgrading our dev account?

chrisumphlett commented 3 years ago

I am told that we have XM Directory at my org so I could help with that testing/development.

chrisumphlett commented 3 years ago

@dsen6644 If I help with the testing (and willing to help with the dev too, I would just want a bit of a nudge to make it a bit easier), is that something we could work on sooner rather than later?

dsen6644 commented 3 years ago

According to the documentation list distribution history should return bounces, blocks, opt outs for individuals under the status field (all values can be seen here).

@chrisumphlett have you run this API call with a distribution where you have bounces/blocks? What are the values you see in the status field?

chrisumphlett commented 3 years ago

I have only used the API with this package, so it's not easy for me to test this quickly. I tried using the "try it out" feature at the qualtrics docs. I'm getting errors. I'm not confident that I know my distribution ID, that may be one problem. I used fetch_distribution for this survey and tried a variety of things from there (id, recipients_mailinglistid).

When I use id, I get an error message of "No directories for brand and target: Directory not found for (XXX)", where XXX is the recipients_mailingListId. This seems promising as it's associating those two. But it's still an error nonetheless. For Headers I only have the token-- should I have the URL as well? It should probably be somewhere. I don't see it as part of the headers in this package's utils.R.

I'm going to need a bit more help to try this out I think (or, more time to figure it out on my own and go through the authentication info at the qualtrics docs). Could you provide a sample API request in R for Qualtrics and then I can update the parameters?

dsen6644 commented 3 years ago

Reference the README for your API_TOKEN and your BASE_URL parameters.

You were correct, distribution ID should be a value in the id field for the return object for fetch_distributions.

What is the resulting structure you get for x?

API_TOKEN = ""
BASE_URL = ""
distributionID = ""

###

headers <- c(
  "X-API-TOKEN" = API_TOKEN,
  "Content-Type" = "application/json",
  "Accept" = "*/*",
  "accept-encoding" = "gzip, deflate"
)

fetch_url <- paste0("https://", BASE_URL, "/API/v3/distributions/", distributionID, "/history")

res <- httr::VERB("GET",
                  url = fetch_url,
                  httr::add_headers(headers),
                  body = NULL)

x <- jsonlite::fromJSON(rawToChar(res$content))

x <- as.list(x$result$elements)

chrisumphlett commented 3 years ago

Thanks having the headers and url strings was helpful. Unfortunately I'm getting the same error message. Let me verify that we have XM Directory, and then contact Qualtrics support.

chrisumphlett commented 3 years ago

According to Qualtrics support I do not have XM Directory. The person who handles our account is OOO today so I'll follow up with them again next week :(

I have them placing the bounced emails, which they can download, in a spreadsheet. I'd also like to track the "opens" which I assume means the recipient opened the email.

If I have to I'll use the distributions endpoint that gives these stats aggregated. I was planning on doing it all on the individual level however.

chrisumphlett commented 3 years ago

Turns out we do have XM directory but have never used it and Qualtrics needs to upgrade something later this week to get it working. @dsen6644 hopefully I can test this early next week, successfully.

chrisumphlett commented 3 years ago

@dsen6644 I am able to successfully use that API endpoint now. I end up with a dataframe with just one row:

dsen6644 commented 3 years ago

Try cloning my fork
Checkout the distribution-history branch
Load all functions devtools::load_all()
use function fetch_distribution_history with distributionID set as your desired distribution id.

Thanks for your help testing this!

dsen6644 commented 3 years ago

Adding one more point...

re-read .Renviron (etc. readRenviron("~/.Renviron")) file after devtools::load_all() as base_url is being overwritten (#208).

chrisumphlett commented 3 years ago

It successfully gets results! But not the full results (see below).

I did have to make two changes on utils.R, to match the case of distributionID from fetch_distribution_history (changed 'Id' to 'ID').

I tested it on a distribution with 47k invitations. This took awhile to run, which seemed like a good thing. But then the data frame returned only had 52 rows. Those rows look good, I don't understand why there's so few.

I'm not familiar with all the hidden functions in this package, it looks like you have methods for handling pagination (fetch_url <- res$result$nextPage.) The Qualtrics documentation shows this parameter:

52 is an odd # to return from a single page though... I tried with our other large distribution which has 21k. The first time I ran it I got an error from Qualtrics:

This didn't happen the second time I tried it. I got 19 rows.

chrisumphlett commented 3 years ago

forgot to mention that I had to install the vcr package

dsen6644 commented 3 years ago

Thanks for catching that typo.
Do you have a small distribution (<50) you can re-run the function with?

chrisumphlett commented 3 years ago

Yes, sort of. We only have two distributions but I was surprised to find that fetch_distributions returns many more, all very small. We don't understand what they are. All of them have 5 or fewer sent surveys.

I picked one with 4. It returned 5 results, because one of the contacts was blocked. So that seems right.

dsen6644 commented 3 years ago

This feels like a Qualtrics API issue, especially if you are getting different errors when you call this function on lists greater than 100.

Can you try this one more time? And let me know if you are still getting less than 100 elements in the return object?

API_TOKEN = ""
BASE_URL = ""
distributionID = ""

###

headers <- c(
  "X-API-TOKEN" = API_TOKEN,
  "Content-Type" = "application/json",
  "Accept" = "*/*",
  "accept-encoding" = "gzip, deflate"
)

fetch_url <- paste0("https://", BASE_URL, "/API/v3/distributions/", distributionID, "/history")

res <- httr::VERB("GET",
                  url = fetch_url,
                  httr::add_headers(headers),
                  body = NULL)

x <- jsonlite::fromJSON(rawToChar(res$content))

str(x$result$elements)

chrisumphlett commented 3 years ago

Got 79 rows from 47k distribution. But I noticed the last column is called "nextPage" and has a string like this: https://sjc1.qualtrics.com/API/v3/distributions/MY-DISTRIBUTION-ID/history?skipToken=REALLY-LONG-ALPHANUMERIC-STRING

I fed that into the fetch URL: fetch_url <- paste0("https://", BASE_URL, "/API/v3/distributions/", distributionID, "/history?skipToken=REALLYLONGSTRING

This returned another 77 rows. And sorting on elements.contactId this appears to be coming down in a sorted manner... eg the first df start with CID_001X... and ends CID_08K... second df starts with CID_08n

Why it gets that strange # of rows I don't know but it looks like we need to do pagination.

dsen6644 commented 3 years ago

Correct, pagination is required to download the full data set. But an individual call should return 100 records. The fact that you're not only getting less than 100 but that it's varying is odd.

Do you think you could mock a reproducible example and send it to Qualtrics support and see if they have an answer?

chrisumphlett commented 3 years ago

I'll do that. I don't mind sharing actual ID's with them. Idk how well they can reproduce it w/o using R but they can hopefully at least understand what's going on and adapt it on their end.

chrisumphlett commented 3 years ago

Opened ticket last week, got this initial response: "That is a great question - I was able to reproduce this on my end, and frankly, I've got no idea why it's an inconsistent number of values returned. Thus, I have escalated this to our technical team for answers, and I will provide you with updates as soon as I receive them."

Am waiting for addtl response from them.

chrisumphlett commented 3 years ago

"I sent it up to our Engineering team who are in the process of looking at it. However, they have determined that it is a minor issue and therefore it may take quite a while before I hear back from them with a resolution."

Regardless... it seems like although the pagination size is odd, the call does paginate correctly. Why not go ahead and build and test this function, and help me get this data :)

dsen6644 commented 3 years ago

Lets retry testing the fetch_distribution_history function.

Try cloning my fork
Checkout the distribution-history branch
Load all functions devtools::load_all()
re-read .Renviron (etc. readRenviron("~/.Renviron")) file after devtools::load_all() as base_url is being overwritten (#208).
use function fetch_distribution_history with distributionID set as your desired distribution id.

Let me know how many records it returns.

chrisumphlett commented 3 years ago

20 records returned, from a distribution with 21,086 invites

dsen6644 commented 3 years ago

fetch_distribution_history does have pagination... so I'm not sure why the return object record count is so low.

Try stepping through the function and see if you get the same results.

chrisumphlett commented 3 years ago

This smells of the last iteration overwriting everything else... The pages are still odd #'s... 87, then 79, then 87, then 79, then 77. I think

  while(!is.null(fetch_url)){

    res <- qualtrics_api_request("GET", url = fetch_url)
    elements <- append(elements, res$result$elements)
    fetch_url <- res$result$nextPage

  }

is doing what it's supposed to do.

Then elements <- res$result$elements happens. This takes elements, a list of 402, and makes it a list of 77 (the size of the last iteration of res).

And it feeds that into the tibble::tibble, which works.

So removing this line fixes it: elements <- res$result$elements.. I think? I get a data frame with 402 rows.

chrisumphlett commented 3 years ago

question about how to use this data: the email address isn't in the results... how do I connect it back to my original list of invitations? I saw this in the Qualtrics docs before but it didn't sink in until I had the data in hand.

There's a contact_id, contact_lookup_id. If they responded I can connect it to my original distribution list via the response_id. But I would like to be able to connect non-responses back to the original list. Is there some other table in Qualtrics that would get me from contact_id back to the original distribution info?

I think we also need to use "List Distribution Links" which would join on contact_id and provides the email address

dsen6644 commented 3 years ago

Correct elements <- res$result$elements was a bug that (I thought?) had been resolved. Trying re-pulling my fork and that should fix at least that issue.

That said a data frame of 402 rows on a ~21k distribution sounds low. This goes back to the API, where I think something is fundamentally wrong, and I don't believe it is, as they say, a small issue.

And yes, if we wanted to include emails we would need a separate or embedded function to handle that. I think List Distribution Links is the best api call, as all the contact api calls require the pool id (which doesn't feel right to me).

chrisumphlett commented 3 years ago

Sorry I gave a bad explanation above. 402 rows was because I stopped stepping through. After I made the change to elements I do get 21k rows in the data frame. I already started using the function to pull down my data :). It was at that point that I realized w/o the email address I couldn't use it the way I had hoped.

dsen6644 commented 3 years ago

Ok that's great news! We can create a second function around distribution links (which can be used by XM and core directory users) and embedded that it into that function.

chrisumphlett commented 3 years ago

Yea that would be great. I'd think that anyone who wants to use distribution history would likely want this with it; and that the distribution links API, by itself, isn't very meaningful.

dsen6644 commented 3 years ago

Updates made, try repulling and seeing if it runs as expected. As stated before, I can't do comprehensive testing due to access.

chrisumphlett commented 3 years ago

Had to make this change in fetch_distribution_history - URL is not a parameter for that function:

When using it on one of our small test distributions (6 emails), it works:

I am getting Qualtrics 500 errors every time, however, when using it for the larger distributions. This was happening occasionally before. Now it's happening every time. It isn't consistent-- I tracked the times it makes it through this loop, and it varied from 19 to 153 to 23. (I added the highlighted lines) I assume this has nothing to do with this code. Hopefully it's a transient problem with Qualtrics right now. If it keeps happening I'll open a ticket with them.

That said - in my experience the Qualtrics API does tend to crap out. I have often wrapped the qualtRics functions in my own try-except. The last time I tried using fetch_distribution_history it got all the way through the distribution (329 iterations), but then failed after that. It might be worth adding error-handling since the page sizes are so small and it has to work w/o error so many times in a row. (Have others had the problems with API? I have experienced this at 2 different employers so it's not just related to my current network).

juliasilge commented 3 years ago

I haven't personally experienced those kinds of problems with the Qualtrics API but some better error handling would likely be a good idea and a welcome contribution.

chrisumphlett commented 3 years ago

I am still having intermittent errors, so I'm pulling it down in pieces. I'll create a ticket with Qualtrics for myself.

For our regular weekly process where we go out and collect responses from a bunch of surveys, I wrap the call to fetch_survey like this:

  for (i in 1:length(active_survey_ids)) {
    # for (i in 22:22) {
    print(paste0("Getting response data for ", active_survey_names[i]))
    tryCatch({
      # fetch responses for each survey
      # This will often fail the first time. Try up to three times
      # https://stackoverflow.com/questions/28969070/assigning-a-value-in-exception-handling-in-r/28969896#28969896

      attempt_fetch <- 1
      while(attempt_fetch != 4){
        responses <- try(responses <- fetch_survey(surveyID = active_survey_ids[i], verbose = TRUE, force_request = TRUE,
                                                   start_date = as.character(as.Date(as_of_date) - 6), end_date = as.character(as.Date(as_of_date) + 1)) %>%
                  rename_all(.funs = tolower))
        if (class(responses)[1] == "try-error") {
          Sys.sleep(5) ## Wait 5 seconds so that transient connection issues can go away
          attempt_fetch <- attempt_fetch + 1
        } else {
          break
        } 
      }
### MORE POST PROCESSING CODE
}

chrisumphlett commented 3 years ago

I opened #217 for the error handling. I think this one could be closed and @dsen6644's changes (with the correction I added above) should be made.

chrisumphlett commented 3 years ago

One other issue I forgot to mention before, in fetch_distribution_history. This needs to be "distributionId" not ID. This column has all NA values otherwise.

chrisumphlett commented 2 years ago

@juliasilge I'm testing both new functions (fetch_distribution_history() and list_distribution_links()) and they are working mostly without errors/interruptions. Maybe Qualtrics has done something to improve on the connection issues I've had in the past, I'd say we hold off any the error handling I mentioned previously.

I ran both functions on distributions with 21k, 35k, and then 47k invitations. list_distribution_links() takes about 50% longer to run than fetch_distribution_history(). The total run time, running the functions consecutively, was 11.5 minutes, 22 minutes, and 32 minutes (estimated), respectively. The last one did have a failure in the links function after 8 minutes, whereas it probably should've taken about 20 minutes. Overall, this is good enough for now for me I feel like it generally works.

I'm going to close this issue and #217

ropensci / qualtRics

Feature request - detailed list of who bounced/opted out/etc #206