skardhamar / rga

R Google Analytics
186 stars 89 forks source link

Save/capture message "Notice: Data set contains sampled data" #79

Open efh0888 opened 8 years ago

efh0888 commented 8 years ago

Is it possible to somehow capture this message, i.e. store it as a character string for later use? I've got a script that automates our data extracts from Google Analytics using the rga package, and a table that keeps logs for each query, so it'd be nice to include a column there indicating whether or not sampling was used.

mattpolicastro commented 8 years ago

The Reporting API response does include fields regarding sampling (containsSampledData, sampleSize, and sampleSpace) so it's not unfeasible. How would you see this data being returned, though? As another column on the results table?

MarkEdmondson1234 commented 8 years ago

I put this data as an attribute of the data frame results, so you can reach it via say attr("sampled", garesults)

efh0888 commented 8 years ago

I think an attribute makes sense

mattpolicastro commented 8 years ago

Had no idea attributes were a thing. Seems great. Prefix the attributes as something like ga:containsSampledData to prevent collisions?

mattpolicastro commented 8 years ago

Took a first stab at this with a simple conditional block in core.R. Thoughts?

efh0888 commented 8 years ago

Looks good to me, except the additional warning is redundant. Might also want to add something like

else {
  attr(ga.data.df, "ga:containsSampledData") <- FALSE
}
mattpolicastro commented 8 years ago

Yep, the warning was an oversight on my part. Latest commit.

I will say I'm having some funny feelings about prefixing the attributes with "ga:"—I like it for clarity and (hopefully) avoiding collisions upstream, but it's fairly indelicate.

MarkEdmondson1234 commented 8 years ago

I would avoid the ':' as it may do funky things to R names.

As a suggested enhancement, I got this from RGoogleAnalytics and it may be good here - it replicates the "This data is drawn from 2% of visits" messages you get in the GA interface:

samplePercent <-  100
  if(!is.null(x$containsSampledData)){
    if(x$containsSampledData) {
      samplePercent <- round(100 * (as.numeric(x$sampleSize) / as.numeric(x$sampleSpace)), 2)
      message("Data is sampled, based on ", samplePercent, "% of visits. Use WALK to mitigate it." )
    }
  }
mattpolicastro commented 8 years ago

I like it. I'll add something in.

EricGoldsmith commented 8 years ago

FWIW, I did something similar in my local fork (and, apologies, never submitted a PR): https://github.com/tedconf/rga/commit/74dd9a44c5d31d7c99657d3f7a4f15136ad7741e

mattpolicastro commented 8 years ago

@EricGoldsmith yeah, that's about what I had in mind. Mind if I pull it into my fork so I can PR it?

EricGoldsmith commented 8 years ago

@mattpolicastro, be my guest.

mattpolicastro commented 8 years ago

Super. I'll restrict myself to that particular commit as your other changes won't merge as cleanly, but I'll start in on it.

mattpolicastro commented 8 years ago

Ok, I think I've incorporated everything into one branch. Any more thoughts before I submit a PR?

MarkEdmondson1234 commented 8 years ago

:+1:

EricGoldsmith commented 8 years ago

@mattpolicastro, can you confirm that this is now fixed, and then close this issue?

mattpolicastro commented 8 years ago

Sorry for not closing the loop on this—it appears to be working correctly after force-updating the package install. However, it looks like @efh0888 may have to close it as I'm not an admin.