Add aggregation by submission date

martinbalfanz commented 3 years ago

While aggregation by build_id is great for pre-release channels, I find myself often looking for telemetry usage by submission date on Release.

Some example scenarios include:

accessing common product metrics like DAU/MAU/ER could speed up opportunity sizing
telling usage trends from code changes apart from seasonality
understand usage patterns, like "when is feature x used the most during a week/month"
being able to react to unforeseen effects of code changes more rapidly, as opposed to waiting for a full release aggregate

The last point is particularly dear to my heart. With some platform work, we can only understand the full effect of code changes on the release channel. This happens for example when we need exposure to a diverse hardware landscape (e.g. printers, CPUs, GPUs, ...) that are not fully covered on pre-release channels. Day-to-day data would help to react more promptly.

acmiyaguchi commented 3 years ago

Having worked closely with the queries underlying glam, I'm skeptical about the usefulness of metrics by submission date. I'm assuming submission date would be another dimension like build_date and app_version. In this case, we would be aggregating across the entire population for a day. Each submission date has a mix of different builds, so it seems that it would be difficult to interpret the resulting aggregates.

To address some specific examples:

telling usage trends from code changes apart from seasonality

I don't believe the current aggregates are dependent on seasonality, due to the way that aggregates are generated. Each client contributes equally to the shape of the aggregate for a particular dimension over its history in a build/version. It would be difficult to make inferences/decisions based on the data if we had to worry about seasonality.

understand usage patterns, like "when is feature x used the most during a week/month"

I think this is difficult to decouple from just general usage (e.g. DAU). Do you have a more specific example of what kinds of features would be time-dependent?

being able to react to unforeseen effects of code changes more rapidly, as opposed to waiting for a full release aggregate

We don't have to wait for a full release to see changes in a particular build -- it's probably sufficient to have 1-2 days to get a reasonable picture of how things are progressing. More data just increases our confidence in the shape of the aggregate.

As a matter of practical concern, adding in submission date as a separate dimension is likely not feasible either without rethinking the data model. Each new dimension increases the number of rows by a multiplicative factor (i.e. we're basically computing every permutation of the dimension columns to build an OLAP cube).

Currently our scalar and histogram aggregate tables are 63TB and 216TB in size, where a scan per TB is more or less ~$5/TB.

# size of tables in TB
% bq show --format json moz-fx-data-shared-prod:telemetry_derived.clients_scalar_aggregates_v1 \
| jq -r '.numBytes' | awk '{ printf $1/1e12 }' 
63.8497
% bq show --format json moz-fx-data-shared-prod:telemetry_derived.clients_histogram_aggregates_v1 \
| jq -r '.numBytes' | awk '{ printf $1/1e12 }'
216.344

I would be curious to hear more though, perhaps this is something that could be prototyped from the source clients daily tables if the use-case were more concrete.

martinbalfanz commented 3 years ago

Thank you! I'll try to add some details to each of them.

I don't believe the current aggregates are dependent on seasonality, due to the way that aggregates are generated. [...]

I didn't mean to correlate code changes and seasonality. What I meant is that we sometimes monitor metrics over time even though there are no code changes between builds. Based on submission_date, we can then answer questions like the one given in the clients_daily docs: "What did clients with characteristics X, Y, and Z do during the period S to E?"

An aggregate over a build_id/major version seems to long if the question is about a shorter period of time (e.g. over specific holidays). I may have chosen a poor example here, as the question is very similar to the next one.

I think this is difficult to decouple from just general usage (e.g. DAU). Do you have a more specific example of what kinds of features would be time-dependent?

I think not all features/products follow Firefox DAU. For example, I worked on DevTools for ~1.5 years, and DevTools DAU behaved very differently. I could imagine that other features may have interesting patterns as well (thinking about things like PiP).

We don't have to wait for a full release to see changes in a particular build -- it's probably sufficient to have 1-2 days to get a reasonable picture of how things are progressing. More data just increases our confidence in the shape of the aggregate.

I would like to understand this better. Maybe it's a mistake on my end, or there is a better way to answer the question(s) I may have.

An example: I currently work with some teams on improving printing. I want to observe if our code changes affect the error rate that clients run into. So in glam, I pick the probe printing_error, aggregation level major version, aggregation to count (or sum) and key to (in my current case) failure. The result looks like the following:

In my understanding, I would need to wait for Fx83 to have more data collected before understanding if there is really a decrease. If, on the other hand, I would look at smaller steps (e.g. daily count of print failures), I would have a baseline of a previous week or two and might observe a trend already (acknowledging the caveats that this method has).

After reading your comments, I realized that my submission_date-based work is often just the start of further analysis.

It may not be as relevant for a generalized tool like glam. The cost also seem too high if time-based questions are only relevant for a few probes. You're making a good point about clients_daily. Maybe my question should have been how I get the handful of probes that I monitor in there 🤔

acmiyaguchi commented 3 years ago

For reference, this is what I see with those dimensions:

The sum (and count) is not a sum over the entire population, but rather sum of that probe over individual clients, and then a histogram/distribution over the population. So if you look at the summary box, you'll notice that most of the quantiles are 1, while the 95th percentile is 3. If this were a sum over the population, then we might expect something more 3-4 million (closer to the number of clients in the bucket).

For something like this, it may be useful to look at the source directly. It's efficient because it only requires scanning of a few columns. Here's a small query for printing.errors over the last 28 days:

https://sql.telemetry.mozilla.org/queries/76833/source#191496

This kind of analysis doesn't scale to the set of all probes, but it might help clarify the types of derived data that would be useful to your work.

rafrombrc commented 3 years ago

Revisiting this old issue... we strongly suspect that doing this aggregation in GLAM would be extremely expensive, and possibly misleading. We'll dig in to validate these suspicions, however, and will report back. If GLAM itself can't do it, we can almost certainly provide a way to escalate out of GLAM to be able to see the aggregation by submission date in another tool like redash or looker.

rafrombrc commented 2 years ago

We've got another effort going on right now to build a Looker based roll-out operational monitoring (OpMon) dashboard, and we have plans to have that dashboard support both build id and submission date as the x-axis. We suspect that OpMon will be the right tool to use for intra-build tracking like this, as opposed to GLAM. Thoughts @emtwo @ecsmyth ?

emtwo commented 2 years ago

We've got another effort going on right now to build a Looker based roll-out operational monitoring (OpMon) dashboard, and we have plans to have that dashboard support both build id and submission date as the x-axis. We suspect that OpMon will be the right tool to use for intra-build tracking like this, as opposed to GLAM. Thoughts @emtwo @ecsmyth ?

I think opmon could certainly support this. However, it wouldn't be as easy as a dropdown menu with pre-aggregated/available data like Glam. A set of metrics+dimensions of interest would need to be defined in advance. For someone who knows there's a subset of metrics they always want to monitor, the setup would happen once and provide an ongoing helpful dashboard. For someone who wants to browse a variety of probes and are not yet sure what they are looking for, Glam can do that better.

alekhyamoz commented 2 years ago

@ecsmyth this ticket is in line with our ongoing discussions about GLAM's role in product management.

rafrombrc commented 2 years ago

We've confirmed that the cost (operational and $$) of doing this in GLAM is prohibitive at the moment. @ecsmyth Now that operational monitoring is more mature, we should look at Martin's use cases to see how many of them OpMon might be able to support.

Iinh commented 2 years ago

@scholtzan since you're now in charge of operational monitoring, do you think this is something we can tackle with op mons in Looker instead? If it's feasible then we can move this ticket to Jira.

scholtzan commented 2 years ago

Yes, it sounds more feasible in opmon, however opmon would also be showing percentiles. I can create a ticket though to investigate

https://github.com/mozilla/opmon/issues/45

rafrombrc commented 2 years ago

Closing because this is now being tracked in the mozilla/opmon#45 ticket.

mozilla / glam

Add aggregation by submission date #1073