grafana / xk6-browser

The browser module adds support for browser automation and end-to-end web testing via the Chrome Devtools Protocol to k6.
https://grafana.com/docs/k6/latest/javascript-api/k6-browser/
GNU Affero General Public License v3.0
343 stars 41 forks source link

Negative `browser_first_meaningful_paint` metric #191

Closed imiric closed 2 years ago

imiric commented 2 years ago

While testing #117 with the script from this comment on main (54cd293), the browser_first_meaningful_paint metric was reported with negative avg, min and med values:

browser_first_meaningful_paint...: avg=-671428500ns min=-2167962000ns med=-671428500ns max=825.1ms  p(90)=525.79ms p(95)=675.45ms

The JSON output shows the following:

{"type":"Point","data":{"time":"2022-01-14T12:07:03.583311545+01:00","value":-2167.962,"tags":{"group":"","scenario":"default","url":"<snipped>"}},"metric":"browser_first_meaningful_paint"}
{"type":"Point","data":{"time":"2022-01-14T12:07:16.499621629+01:00","value":825.105,"tags":{"group":"::Click Continue (Recommendations)","scenario":"default","url":"<snipped>"}},"metric":"browser_first_meaningful_paint"}
inancgumus commented 2 years ago

Interesting...

imiric commented 2 years ago

To give more background for #288: this issue happened when receiving Page.lifecycleEvents for different frames which overwrote the global FrameSession.initTime value on which the metric value calculation was being done.

To illustrate the behavior, see the following log excerpt captured with the script from this comment and the command XK6_BROWSER_LOG=trace XK6_HEADLESS=1 xk6 run -v -o json=debug.json examples/117-vistaprint-20220113.js 2>&1 | tee debug.log (lines prefixed with >>> are my own print statements):

time="2022-03-31T16:30:27+02:00" level=debug msg="sid:84F576B34759623D065D20DB493DF7CD tid:95BD385A9A5EAC2492AACA16C9E71CF6 fid:95BD385A9A5EAC2492AACA16C9E71CF6 event:\"init\"" category="FrameSession:onPageLifecycle" elapsed="0 ms" goroutine=181
>>> 2022-03-31 16:30:27.586423133 +0200 CEST m=+10.694632817: setting fs.initTime to: 2022-03-31 16:30:27.225702 +0200 CEST
[...]
time="2022-03-31T16:30:31+02:00" level=debug msg="sid:84F576B34759623D065D20DB493DF7CD tid:95BD385A9A5EAC2492AACA16C9E71CF6 fid:469EF161100F607D1239C5B8BA8912D8 event:\"init\"" category="FrameSession:onPageLifecycle" elapsed="0 ms" goroutine=181
>>> 2022-03-31 16:30:31.928810345 +0200 CEST m=+15.037020037: setting fs.initTime to: 2022-03-31 16:30:31.569043 +0200 CEST
[...]
time="2022-03-31T16:30:33+02:00" level=debug msg="sid:84F576B34759623D065D20DB493DF7CD tid:95BD385A9A5EAC2492AACA16C9E71CF6 fid:95BD385A9A5EAC2492AACA16C9E71CF6 event:\"firstMeaningfulPaint\"" category="FrameSession:onPageLifecycle" elapsed="0 ms" goroutine=181
>>> 2022-03-31 16:30:33.409956558 +0200 CEST m=+16.518166281: pushing negative metric browser_first_meaningful_paint with value: -2.6383s
>>> fs.initTime: 2022-03-31 16:30:31.569043 +0200 CEST, endTime: 2022-03-31 16:30:28.930743 +0200 CEST

Notice that:

  1. We first receive an init event for fid:95BD385A9A5EAC2492AACA16C9E71CF6 and set fs.initTime to 2022-03-31 16:30:27.225702 +0200 CEST.

  2. 4 seconds later we receive an init event for an unrelated fid:469EF161100F607D1239C5B8BA8912D8 and overwrite fs.initTime to 2022-03-31 16:30:31.569043 +0200 CEST.

  3. 2 seconds later we receive a firstMeaningfulPaint event for the original fid:95BD385A9A5EAC2492AACA16C9E71CF6, but emit the browser_first_meaningful_paint metric with the wrong value of -2.6383s because the calculation is being done based on the updated fs.initTime, and the original frame init time was lost.

So the fix is simple: store the init time on Frame instead and calculate the elapsed time per frame, since the metric emission is done per frame anyway.


Unfortunately this exposes another issue caused by the racy way CDP events are handled. Sometimes we might process DOMContentLoaded before init (even though they arrive in the correct order via CDP), in which case Frame.initTime is zero (0001-01-01 00:00:00 +0000 UTC). The metric value in that case would be in the order of years, and would completely skew the avg, max and p(95) results. This happens on current main as well, but it is less visible since FrameSession.initTime would likely always be set to something, so some metrics would be slightly wrong in the order of ms or s. The workaround in 47d9e82 masks this by not emitting the metric when this happens, but the proper fix would be fixing the racy CDP event handling, maybe with the approach proposed here.

To illustrate this second behavior, see the following log:

8581:time="2022-04-01T14:52:44+02:00" level=trace msg="<- {\"method\":\"Page.lifecycleEvent\",\"params\":{\"frameId\":\"E33B7459A9457721D8347D0D6A98F0C0\",\"loaderId\":\"DB271DD551D011164B141C868C8A800F\",\"name\":\"init\",\"timestamp\":71075.049544},\"sessionId\":\"7749F7E34D6032878C09B4205D185FD7\"}" category="cdp:recv" elapsed="0 ms" goroutine=69
8582:time="2022-04-01T14:52:44+02:00" level=trace msg="<- {\"method\":\"Page.lifecycleEvent\",\"params\":{\"frameId\":\"E33B7459A9457721D8347D0D6A98F0C0\",\"loaderId\":\"DB271DD551D011164B141C868C8A800F\",\"name\":\"DOMContentLoaded\",\"timestamp\":71075.049894},\"sessionId\":\"7749F7E34D6032878C09B4205D185FD7\"}" category="cdp:recv" elapsed="0 ms" goroutine=69
8584:time="2022-04-01T14:52:44+02:00" level=debug msg="sid:7749F7E34D6032878C09B4205D185FD7 tid:49BF3B5EBB5DA3DF8ACD3E3D9D9AE921 fid:E33B7459A9457721D8347D0D6A98F0C0 event:\"DOMContentLoaded\"" category="FrameSession:onPageLifecycle" elapsed="0 ms" goroutine=179
8589:time="2022-04-01T14:52:44+02:00" level=debug msg="fid:E33B7459A9457721D8347D0D6A98F0C0 furl:\"\" m:browser_dom_content_loaded init:\"0001-01-01 00:00:00 +0000 UTC\" t:\"2022-04-01 14:52:44.049894 +0200 CEST\" v:9223372036854.775391" category="Frame:emitMetric" elapsed="0 ms" goroutine=179
8590:time="2022-04-01T14:52:44+02:00" level=debug msg="sid:7749F7E34D6032878C09B4205D185FD7 tid:49BF3B5EBB5DA3DF8ACD3E3D9D9AE921 fid:E33B7459A9457721D8347D0D6A98F0C0 event:\"init\"" category="FrameSession:onPageLifecycle" elapsed="0 ms" goroutine=179
8591:time="2022-04-01T14:52:44+02:00" level=debug msg="sid:7749F7E34D6032878C09B4205D185FD7 tid:49BF3B5EBB5DA3DF8ACD3E3D9D9AE921 fid:E33B7459A9457721D8347D0D6A98F0C0 initTime:\"2022-04-01 14:52:44.049544 +0200 CEST\"" category="FrameSession:onPageLifecycle" elapsed="0 ms" goroutine=179

Notice that:

  1. On line 8581 we receive the CDP init event for frame ID E33B7459A9457721D8347D0D6A98F0C0, followed by DOMContentLoaded immediately after. This is the very initial page load, even before about:blank is loaded.

  2. Then on line 8584 we process DOMContentLoaded first, at which point Frame.initTime is zero. Line 8589 shows this, and the very wrong 9223372036854.775391 metric value.

  3. The init event is processed after this, on line 8590, which is where we set the correct Frame.initTime for subsequent events.

A related question might be whether we even want to emit metrics for these initial page loads and about:blank. They're not related to the SUT, but maybe they would still be useful to measure browser performance? :thinking: We might want to tag them differently in that case, or should we just remove them altogether? Regardless, the race condition would still exist even on regular pages, so a proper fix is still needed.