mozilla / mozilla-reports

Repository for public analyses.
https://mozilla.report
5 stars 12 forks source link

Add analysis of hang data on Nightly #71

Closed squarewave closed 7 years ago

squarewave commented 7 years ago

I'm trying to assess the relationship between BHR hangs and hangs we use for the "Input Lag" release criteria for Quantum. I'm posting this here because I'd like to get a bit of review on it and any ideas from an analysis perspective on what the data might mean, beyond what I've said in the markdown.

Anyway, let me know what needs to be cleared up, or if you have any criticisms!

squarewave commented 7 years ago

Quick update on this: while working on what you mentioned I noticed this:

image

(Input lag for content seems to be double counted in the input lag for chrome, so I may need to adjust the analysis to compensate, so I'm not posting an updated version just yet.)

Regarding the question at the end of my analysis: the data does seem to be inconsistent - for the content process, the sums by date seem to be fairly tightly correlated (the actual correlation coefficient is 0.918), and yet at the level of individual pings, there seems to be very little correlation (0.413). I think my question is this: my hope was that BHR hangs were the dominant cause of Input Lag, but my understanding of the low correlation between individual BHR hangs and Input Lag events in single pings is that while it's still plausible that BHR hangs cause some number of Input Lag events, it's unlikely that they are the dominant cause, and the high correlation at the macro scale just suggests a third cause that's responsible for both BHR and Input Lag. Does that sound reasonable?

harterrt commented 7 years ago

my hope was that BHR hangs were the dominant cause of Input Lag, but my understanding of the low correlation between individual BHR hangs and Input Lag events in single pings is that while it's still plausible that BHR hangs cause some number of Input Lag events, it's unlikely that they are the dominant cause, and the high correlation at the macro scale just suggests a third cause that's responsible for both BHR and Input Lag. Does that sound reasonable?

I'm not sure I understand. It looks like this analysis compares input lag and bhr hangs for parent (low correlation) and content (high correlation) processes. Aren't both of these for 'single pings'?

squarewave commented 7 years ago

I'm not sure I understand. It looks like this analysis compares input lag and bhr hangs for parent (low correlation) and content (high correlation) processes. Aren't both of these for 'single pings'?

No - the charts you're probably looking at aggregate all of the pings for a single build date, and take those aggregated values as single data points. When we do this, the correlation for the content process is high, since the values per date seem to change together. However, when we take the hang counts for individual pings as single data points, the correlation is much lower. I.e., there are many pings with, say, 4 BHR hangs and 0 input hangs and vice versa.

harterrt commented 7 years ago

However, when we take the hang counts for individual pings as single data points, the correlation is much lower. I.e., there are many pings with, say, 4 BHR hangs and 0 input hangs and vice versa.

Is the chart that supports this conclusion in this report? The only charts I can approximate correlation from are the line graphs.

squarewave commented 7 years ago

Is the chart that supports this conclusion in this report? The only charts I can approximate correlation from are the line graphs.

Updated with the charts that I alluded to. See the "Content Hang Stats" scatterplot for aggregate correlation, and "Content Hang Stats (per ping)" for the per ping correlation (or lack thereof).