mikehardy / google-analytics-java

Open Source license compatible Java API for Google Analytics
9 stars 3 forks source link

Implement sampling strategy for large-volume applications #12

Closed mikehardy closed 5 years ago

mikehardy commented 5 years ago

It's possible with high-volume applications to go over the allowed amount of server hits, implementing the sampling strategy within the library would be useful for large-volume library consumers

mikehardy commented 5 years ago

Feature as described in docs for official client is that enabling sampling strategy will cause each specific device to be elected for the sample on install (generation of gaClientId file, I presume) and this will persist for the duration of the installation. This bears testing but seems sensible.

An alternate strategy could also be to select each session, and have a "session timeout" in the client where if enough time passes it is a new session and sample election wlll happen then.

I'm not sure which is better or even if one is better, but docs indicate the official client implements the first option.

There is a caveat that if sample rate is changed at any point, you have to segment the reporting or you the reports generate inconsistent information.

Note that this is a requirement for AnkiDroid 2.9 which is imminent so this will happen shortly.

mikehardy commented 5 years ago

Session documentation start and end are the :


Field Name  Protocol Parameter  Value Type  Default Value   Max Length  Supported Hit Types
sessionControl  sc  text    None    None    all
Example value: start
Example usage:
// Starts a new session.
ga('send', 'pageview', {'sessionControl': 'start'});
mikehardy commented 5 years ago

sampleRate - Specifies what percentage of users should be tracked. This defaults to 100 (no users are sampled out) but large sites may need to use a lower sample rate to stay within Google Analytics processing limits