Open efh0888 opened 8 years ago
The Reporting API response does include fields regarding sampling (containsSampledData
, sampleSize
, and sampleSpace
) so it's not unfeasible. How would you see this data being returned, though? As another column on the results table?
I put this data as an attribute of the data frame results, so you can reach it via say attr("sampled", garesults)
I think an attribute makes sense
Had no idea attributes were a thing. Seems great. Prefix the attributes as something like ga:containsSampledData
to prevent collisions?
Took a first stab at this with a simple conditional block in core.R
. Thoughts?
Looks good to me, except the additional warning is redundant. Might also want to add something like
else {
attr(ga.data.df, "ga:containsSampledData") <- FALSE
}
Yep, the warning was an oversight on my part. Latest commit.
I will say I'm having some funny feelings about prefixing the attributes with "ga:"—I like it for clarity and (hopefully) avoiding collisions upstream, but it's fairly indelicate.
I would avoid the ':' as it may do funky things to R names.
As a suggested enhancement, I got this from RGoogleAnalytics and it may be good here - it replicates the "This data is drawn from 2% of visits" messages you get in the GA interface:
samplePercent <- 100
if(!is.null(x$containsSampledData)){
if(x$containsSampledData) {
samplePercent <- round(100 * (as.numeric(x$sampleSize) / as.numeric(x$sampleSpace)), 2)
message("Data is sampled, based on ", samplePercent, "% of visits. Use WALK to mitigate it." )
}
}
I like it. I'll add something in.
FWIW, I did something similar in my local fork (and, apologies, never submitted a PR): https://github.com/tedconf/rga/commit/74dd9a44c5d31d7c99657d3f7a4f15136ad7741e
@EricGoldsmith yeah, that's about what I had in mind. Mind if I pull it into my fork so I can PR it?
@mattpolicastro, be my guest.
Super. I'll restrict myself to that particular commit as your other changes won't merge as cleanly, but I'll start in on it.
Ok, I think I've incorporated everything into one branch. Any more thoughts before I submit a PR?
:+1:
@mattpolicastro, can you confirm that this is now fixed, and then close this issue?
Sorry for not closing the loop on this—it appears to be working correctly after force-updating the package install. However, it looks like @efh0888 may have to close it as I'm not an admin.
Is it possible to somehow capture this message, i.e. store it as a character string for later use? I've got a script that automates our data extracts from Google Analytics using the rga package, and a table that keeps logs for each query, so it'd be nice to include a column there indicating whether or not sampling was used.