skardhamar / rga

R Google Analytics
186 stars 89 forks source link

Auto-pagination when fetching management resources #73

Open mattpolicastro opened 9 years ago

mattpolicastro commented 9 years ago

I'm working with a pretty complex account hierarchy with a ton of profiles, and am butting up against the API limit for 1000 results per call when requesting management resources. At the moment, I can work around it with the following:

profiles1 <- ga$getProfiles()
profiles2 <- ga$getProfiles(start = 1001)
profiles <- rbind.pages(list(profiles1, profiles2))

Obviously, this isn't very clean and is prone to user error. How do folks feel about using something similar to the batch = TRUE parameter when fetching management entities?

The management API's response resources include nextLink and previousLink properties, which I've successfully used for cycling through results in Node. A rough sketch of what I'm recommending (based off of rga/R/mgmt.R):

getMGMTData = function(url, keep, start, max, previous) {

    query <- paste(paste("access_token", .self$getToken()$access_token, sep = "="),
                   paste("start-index", start, sep = "="),
                   paste("max-results", max, sep = "="), sep = "&")
    url <- paste(url, query = query, sep = "?")
    request <- GET(url)
    ga.json <- jsonlite::fromJSON(content(request, "text"))
    if (is.null(ga.json)) {
        stop("data fetching did not output correct format")
    }
    df <- ga.json$items
    # New: bind previous results to new results
    if (previous) {
      df <- rbind(df, previous)
    }

    # New: if GA API response contains nextlink, pursue the next link
    if (ga.json$nextLink) {
      return(.self$getMGMTData(ga.json$nextLink, keep, start, max, previous))
    } else {
      return(df[keep])
    }
}

Was going to test/develop sometime soon, but figured I'd see if anyone had any thoughts/feedback. Thanks!

mattpolicastro commented 9 years ago

Started working on this on my own fork. Have a rough implementation done, but still trying to figure out what the default behaviours should be (and whether I'm interpreting the start/max/batch params correctly, per @skardhamar's intent.