jdeboer / ganalytics

Interact with Google Analytics using R
Other
75 stars 30 forks source link
analytics-api google-analytics google-tag-manager r

Interactively querying Google Analytics reports

Johann de Boer 2018-06-07

ganalytics

Travis-CI Build
Status Coverage
status CRAN
status Rdoc

Classes and methods for interactive use of the Google Analytics core reporting, real-time reporting, multi-channel funnel reporting, metadata, configuration management and Google Tag Manager APIs.

The aim of this package is to support R users in defining reporting queries using natural R expressions instead of being concerned about API technical intricacies like query syntax, character code escaping and API limitations.

This package provides functions for querying the Google Analytics core reporting, real-time reporting, multi-channel funnel reporting and management APIs, as well as the Google Tag Manager API. Write methods are also provided for the Google Analytics Management and Google Tag Manager APIs so that you can, for example, change tag, property or view settings.

Updates

Support for GoogleAnalyticsR integration is now available for segments and table filter objects. You can supply these objects to the google_analytics function in GoogleAnalyticsR by using as(), supplying the appropriate GoogleAnalyticsR class names, which are "segment_ga4" for segments and ".filter_clauses_ga4" for table filters. Soon GoogleanalyticsR will implicitly coerce ganalytics segments and table filters so that you do not need to explicitly coerce using as().

Many new functions have been provided for writing segmentation expressions:

Multi-channel funnel (MCF) and real-time (RT) queries can now be constructed, but work is still needed to process the response from these queries - stay tuned for updates on this.

Instead of using Or, And, and Not, it is now possible to use familiar R language Boolean operators, | (Or), & (And), and ! (Not) instead (thanks to @hadley for suggestion #2). It is important to keep in mind however that Google Analytics requires Or to have precedence over And, which is the opposite to the natural precedence given by R when using the | and & operators. Therefore, remember to use parentheses ( ) to enforce the correct order of operation to your Boolean expressions. For example my_filter <- !bounced & (completed_goal | transacted) is a valid structure for a Google Analytics reporting API filter expression.

You can now query the Google Analytics Management API to obtain details in R about the configuration of your accounts, properties and views, such as goals you have defined. There are write methods available too, but these have not been fully tested so use with extreme care. If you wish to use these functions, it is recommended that you test these using test login, otherwise avoid using the “INSERT”, “UPDATE” and “DELETE” methods.

There is also some basic support for the Google Tag Manager API, but again, this is a work in progress so take care with the write methods above.

Installation

1. Install the necessary packages into R via the GitHub repository

Prerequisites

Current stable release from CRAN

You can install the released version of ganalytics from CRAN with:

install.packages("ganalytics")

Current development release from GitHub

Alternatively, you can execute the following statements in R to install the current stable development version of ganalytics from GitHub:

# Install the latest version of remotes via CRAN
install.packages("remotes")
# Install ganalytics via the GitHub repository.
remotes::install_github("jdeboer/ganalytics")
# End

2. Prepare your Google API application (you only need to do this once)

Note: For further information about Google APIs, please refer to the References section at the end of this document.

3. Set your system environment variables (this is optional but recommended)

GOOGLE_APIS_CONSUMER_ID = <Your client ID>
GOOGLE_APIS_CONSUMER_SECRET = <Your client secret>

Alternatively you can temporarily set your environment variables straight from R using this command:

Sys.setenv(
  GOOGLE_APIS_CONSUMER_ID = "<Your client ID>",
  GOOGLE_APIS_CONSUMER_SECRET = "<Your client secret>"
)

Note: For other operating systems please refer to the Reference section at the end of this document.

4. Authenticate and attempt your first query with ganalytics

If you have successfully executed all of the above R commands you should see the output of the default ganalytics query; sessions by day for the past 7 days. For example:

        date sessions
1 2015-03-27     2988
2 2015-03-28     1594
3 2015-03-29     1912
4 2015-03-30     3061
5 2015-03-31     2609
6 2015-04-01     2762
7 2015-04-02     2179
8 2015-04-03     1552

Note: A small file will be saved to your home directory (‘My Documents’ in Windows) to cache your new reusable authentication token.

Examples

As demonstrated in the installation steps above, before executing any of the following examples:

  1. Load the ganalytics package
  2. Generate a gaQuery object using the GaQuery() function and assigning the object to a variable name such as myQuery.

Assumptions

The following examples assume you have successfully completed the above steps and have named your Google Analytics query object: myQuery.

Example 1 - Setting the date range

# Set the date range from 1 January 2013 to 31 May 2013: (Dates are specified in the format "YYYY-MM-DD".)
DateRange(myQuery) <- c("2013-01-01", "2013-05-31")

myData <- GetGaData(myQuery)
summary(myData)

# Adjust the start date to 1 March 2013:
StartDate(myQuery) <- "2013-03-01"
# Adjust the end date to 31 March 2013:
EndDate(myQuery) <- "2013-03-31"

myData <- GetGaData(myQuery)
summary(myData)
# End

Example 2 - Choosing what metrics to report

# Report number of page views instead
Metrics(myQuery) <- "pageviews"

myData <- GetGaData(myQuery)
summary(myData)

# Report both pageviews and sessions
Metrics(myQuery) <- c("pageviews", "sessions")
# These variations are also acceptable
Metrics(myQuery) <- c("ga:pageviews", "ga.sessions")

myData <- GetGaData(myQuery)
summary(myData)
# End

Example 3 - Selecting what dimensions to split your metrics by

# Similar to metrics, but for dimensions
Dimensions(myQuery) <- c("year", "week", "dayOfWeekName", "hour")

# Lets set a wider date range
DateRange(myQuery) <- c("2012-10-01", "2013-03-31")

myData <- GetGaData(myQuery)
head(myData)
tail(myData)
# End

Example 4 - Sort by

# Sort by descending number of pageviews
SortBy(myQuery) <- "-pageviews"

myData <- GetGaData(myQuery)
head(myData)
tail(myData)
# End

Example 5 - Row filters

# Filter for Sunday sessions only
sundayExpr <- Expr(~dayOfWeekName == "Sunday")
TableFilter(myQuery) <- sundayExpr

myData <- GetGaData(myQuery)
head(myData)

# Remove the filter
TableFilter(myQuery) <- NULL

myData <- GetGaData(myQuery)
head(myData)
# End

Example 6 - Combining filters with AND

# Expression to define Sunday sessions
sundayExpr <- Expr(~dayOfWeekName == "Sunday")
# Expression to define organic search sessions
organicExpr <- Expr(~medium == "organic")
# Expression to define organic search sessions made on a Sunday
sundayOrganic <- sundayExpr & organicExpr
TableFilter(myQuery) <- sundayOrganic

myData <- GetGaData(myQuery)
head(myData)

# Let's concatenate medium to the dimensions for our query
Dimensions(myQuery) <- c(Dimensions(myQuery), "medium")

myData <- GetGaData(myQuery)
head(myData)
# End

Example 7 - Combining filters with OR

# In a similar way to AND
loyalExpr <- !Expr(~sessionCount %matches% "^[0-3]$") # Made more than 3 sessions
recentExpr <- Expr(~daysSinceLastSession %matches% "^[0-6]$") # Visited sometime within the past 7 days.
loyalOrRecent <- loyalExpr | recentExpr
TableFilter(myQuery) <- loyalOrRecent

myData <- GetGaData(myQuery)
summary(myData)
# End

Example 8 - Filters that combine ORs with ANDs

loyalExpr <- !Expr(~sessionCount %matches% "^[0-3]$") # Made more than 3 sessions
recentExpr <- Expr(~daysSinceLastSession %matches% "^[0-6]$") # Visited sometime within the past 7 days.
loyalOrRecent <- loyalExpr | recentExpr
sundayExpr <- Expr(~dayOfWeekName == "Sunday")
loyalOrRecent_Sunday <- loyalOrRecent & sundayExpr
TableFilter(myQuery) <- loyalOrRecent_Sunday

myData <- GetGaData(myQuery)
summary(myData)

# Perform the same query but change which dimensions to view
Dimensions(myQuery) <- c("sessionCount", "daysSinceLastSession", "dayOfWeek")

myData <- GetGaData(myQuery)
summary(myData)
# End

Example 9 - Sorting ‘numeric’ dimensions (continuing from example 8)

# Continuing from example 8...

# Change filter to loyal session AND recent sessions AND visited on Sunday
loyalAndRecent_Sunday <- loyalExpr & recentExpr & sundayExpr
TableFilter(myQuery) <- loyalAndRecent_Sunday

# Sort by decending visit count and ascending days since last visit.
SortBy(myQuery) <- c("-sessionCount", "+daysSinceLastSession")
myData <- GetGaData(myQuery)
head(myData)

# Notice that the Google Analytics Core Reporting API doesn't recognise 'numerical' dimensions as
# ordered factors when sorting. We can use R to sort instead, such as using dplyr.
library(dplyr)
myData <- myData %>% arrange(desc(sessionCount), daysSinceLastSession)
head(myData)
tail(myData)
# End

Example 10 - Session segmentation

# Visit segmentation is expressed similarly to row filters and supports AND and OR combinations.
# Define a segment for sessions where a "thank-you", "thankyou" or "success" page was viewed.
thankyouExpr <- Expr(~pagePath %matches% "thank\\-?you|success")
Segments(myQuery) <- thankyouExpr

# Reset the filter
TableFilter(myQuery) <- NULL

# Split by traffic source and medium
Dimensions(myQuery) <- c("source", "medium")

# Sort by decending number of sessions
SortBy(myQuery) <- "-sessions"

myData <- GetGaData(myQuery)
head(myData)
# End

Example 11 - Using automatic pagination to get more than 10,000 rows of data per query

# Sessions by date and hour for the years 2016 and 2017:
# First let's clear any filters or segments defined previously
TableFilter(myQuery) <- NULL
Segments(myQuery) <- NULL
# Define our date range
DateRange(myQuery) <- c("2016-01-01", "2017-12-31")
# Define our metrics and dimensions
Metrics(myQuery) <- "sessions"
Dimensions(myQuery) <- c("date", "dayOfWeekName", "hour")
# Let's allow a maximum of 20000 rows (default is 10000)
MaxResults(myQuery) <- 20000

myData <- GetGaData(myQuery)
nrow(myData)

## Let's use dplyr to analyse the data
library(dplyr)

# Sessions by day of week
sessions_by_dayOfWeek <- myData %>%
  count(dayOfWeekName, wt = sessions) %>% 
  mutate(dayOfWeekName = factor(dayOfWeekName, levels = c(
    "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"
  ), labels = c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"), ordered = TRUE)) %>% 
  arrange(dayOfWeekName)
with(
  sessions_by_dayOfWeek,
  barplot(n, names.arg = dayOfWeekName, xlab = "day of week", ylab = "sessions")
)

# Sessions by hour of day
sessions_by_hour <- myData %>%
  count(hour, wt = sessions)
with(
  sessions_by_hour,
  barplot(n, names.arg = hour, xlab = "hour", ylab = "sessions")
)
# End

Example 12 - Using ggplot2

To run this example first install ggplot2 if you haven’t already.

install.packages("ggplot2")

Once installed, then run the following example.

library(ggplot2)
library(dplyr)

# Sessions by date and hour for the years 2016 and 2017:
# First let's clear any filters or segments defined previously
TableFilter(myQuery) <- NULL
Segments(myQuery) <- NULL
# Define our date range
DateRange(myQuery) <- c("2016-01-01", "2017-12-31")
# Define our metrics and dimensions
Metrics(myQuery) <- "sessions"
Dimensions(myQuery) <- c("date", "dayOfWeek", "hour", "deviceCategory")
# Let's allow a maximum of 40000 rows (default is 10000)
MaxResults(myQuery) <- 40000

myData <- GetGaData(myQuery)

# Sessions by hour of day and day of week
avg_sessions_by_hour_wday_device <- myData %>% 
  group_by(hour, dayOfWeek, deviceCategory) %>% 
  summarise(sessions = mean(sessions)) %>% 
  ungroup()

# Relabel the days of week
levels(avg_sessions_by_hour_wday_device$dayOfWeek) <- c(
  "Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"
)

# Plot the summary data
qplot(
  x = hour,
  y = sessions,
  data = avg_sessions_by_hour_wday_device,
  facets = ~dayOfWeek,
  fill = deviceCategory,
  geom = "col"
)

# End

Thanks to:

Useful references

  1. Google Analytics Core Reporting API reference guide
  2. Google Analytics Dimensions and Metrics reference
  3. Creating a Google API project
  4. Generating an OAuth 2.0 client ID for Google APIs
  5. Using OAuth 2.0 for Installed Applications
  6. EnvPane utility for setting environment variables in OSX
  7. Setting environment variables in Ubuntu Linux

Notes

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

Google Analytics and Google Tag Manager are trademarks of Google.