skardhamar / rga

R Google Analytics
187 stars 90 forks source link

Include RGA's date in each chunk of a walk #57

Open kingo55 opened 9 years ago

kingo55 commented 9 years ago

Hi guys,

Rather than extracting ga:date as a dimension in a query, @jdeboer had an excellent idea with his plugin to include date in the output of a Walk query. That allows you to extract more dimensions when date is inferred by the parameters of the walk function.

Would it be as simple as the following change?

    getDataInWalks = function(total, max, batch, ids, start.date, end.date, date.format,
                              metrics, dimensions, sort, filters, segment, fields, envir) {
        # this function will extract data day-by-day (to avoid sampling)
        walks.max <- ceiling(as.numeric(difftime(end.date, start.date, units = "days")))
        chunk.list <- vector("list", walks.max + 1)

        for (i in 0:(walks.max)) {
            date <- format(as.POSIXct(start.date) + days(i), "%Y-%m-%d")

            message(paste("Run (", i + 1, "/", walks.max + 1, "): for date ", date, sep = ""))
            chunk <- .self$getData(ids = ids, start.date = date, end.date = date, date.format = date.format,
                                   metrics = metrics, dimensions = dimensions, sort = sort, filters = filters,
                                   segment = segment, fields = fields, envir = envir, max = max,
                                   rbr = TRUE, messages = FALSE, return.url = FALSE, batch = batch)
            message(paste("Received:", nrow(chunk), "observations"))
            chunk$walk_date <- date
            chunk.list[[i + 1]] <- chunk
        }

        return(do.call(rbind, chunk.list, envir = envir))
    }
BrianWeinstein commented 9 years ago

It's a great idea. When all of the batch problems existed, I built a function that does something similar.

getdata <- function(dateTF){
  dateSequence <- seq(from = as.Date(startDate, "%Y-%m-%d"), to = as.Date(endDate,"%Y-%m-%d"), by = "day")
  dateSequenceLength <- length(dateSequence)
  outputList <<- vector("list", dateSequenceLength)
  for (i in 1:length(dateSequence)){
    print(paste("Pulling observations for date ",dateSequence[i]," (Day ",i,"/",dateSequenceLength,")",sep=""),quote=FALSE)
    outputTemp <- try(ga$getData(
      profile,
      batch = TRUE,
      walk = FALSE,
      dateSequence[i],
      dateSequence[i], 
      dimensions = dimensionsInput,
      metrics = metricsInput,
      segment = segmentIDInput, 
      sort = sortInput,
      filters = filtersInput,
      start = 1))
    numObs <- nrow(outputTemp)
    if (!(is.null(numObs))){
      if(dateTF){outputTemp <- cbind(date=dateSequence[i],outputTemp)}
      outputList[[i]] <<- outputTemp
      message(paste("Received: ",numObs," observation(s) for date ",dateSequence[i]," (Day ",i,"/",dateSequenceLength,")",sep=""))
    }
  }
  return(rbindlist(outputList))
}

With input:

startDate <- ""
endDate <- ""
dimensionsInput <- ""
metricsInput <- ""
segmentIDInput <- ""
filtersInput <- "" 
sortInput <- ""

And call getdata(TRUE) to prepend each row with the observation date, or getdata(FALSE) to do the standard day-by-day walk without any date.

Now that the batch error is fixed, this is really only useful for when you've used all 7 dimension slots, but still need ga:date included.

BrianWeinstein commented 9 years ago

OR you could do it the right way and just edit the getDataInWalks function as you described haha.