Closed imiric closed 2 years ago
Interesting...
To give more background for #288: this issue happened when receiving Page.lifecycleEvent
s for different frames which overwrote the global FrameSession.initTime
value on which the metric value calculation was being done.
To illustrate the behavior, see the following log excerpt captured with the script from this comment and the command XK6_BROWSER_LOG=trace XK6_HEADLESS=1 xk6 run -v -o json=debug.json examples/117-vistaprint-20220113.js 2>&1 | tee debug.log
(lines prefixed with >>>
are my own print statements):
time="2022-03-31T16:30:27+02:00" level=debug msg="sid:84F576B34759623D065D20DB493DF7CD tid:95BD385A9A5EAC2492AACA16C9E71CF6 fid:95BD385A9A5EAC2492AACA16C9E71CF6 event:\"init\"" category="FrameSession:onPageLifecycle" elapsed="0 ms" goroutine=181
>>> 2022-03-31 16:30:27.586423133 +0200 CEST m=+10.694632817: setting fs.initTime to: 2022-03-31 16:30:27.225702 +0200 CEST
[...]
time="2022-03-31T16:30:31+02:00" level=debug msg="sid:84F576B34759623D065D20DB493DF7CD tid:95BD385A9A5EAC2492AACA16C9E71CF6 fid:469EF161100F607D1239C5B8BA8912D8 event:\"init\"" category="FrameSession:onPageLifecycle" elapsed="0 ms" goroutine=181
>>> 2022-03-31 16:30:31.928810345 +0200 CEST m=+15.037020037: setting fs.initTime to: 2022-03-31 16:30:31.569043 +0200 CEST
[...]
time="2022-03-31T16:30:33+02:00" level=debug msg="sid:84F576B34759623D065D20DB493DF7CD tid:95BD385A9A5EAC2492AACA16C9E71CF6 fid:95BD385A9A5EAC2492AACA16C9E71CF6 event:\"firstMeaningfulPaint\"" category="FrameSession:onPageLifecycle" elapsed="0 ms" goroutine=181
>>> 2022-03-31 16:30:33.409956558 +0200 CEST m=+16.518166281: pushing negative metric browser_first_meaningful_paint with value: -2.6383s
>>> fs.initTime: 2022-03-31 16:30:31.569043 +0200 CEST, endTime: 2022-03-31 16:30:28.930743 +0200 CEST
Notice that:
We first receive an init
event for fid:95BD385A9A5EAC2492AACA16C9E71CF6
and set fs.initTime
to 2022-03-31 16:30:27.225702 +0200 CEST
.
4 seconds later we receive an init
event for an unrelated fid:469EF161100F607D1239C5B8BA8912D8
and overwrite fs.initTime
to 2022-03-31 16:30:31.569043 +0200 CEST
.
2 seconds later we receive a firstMeaningfulPaint
event for the original fid:95BD385A9A5EAC2492AACA16C9E71CF6
, but emit the browser_first_meaningful_paint
metric with the wrong value of -2.6383s
because the calculation is being done based on the updated fs.initTime
, and the original frame init
time was lost.
So the fix is simple: store the init
time on Frame
instead and calculate the elapsed time per frame, since the metric emission is done per frame anyway.
Unfortunately this exposes another issue caused by the racy way CDP events are handled. Sometimes we might process DOMContentLoaded
before init
(even though they arrive in the correct order via CDP), in which case Frame.initTime
is zero (0001-01-01 00:00:00 +0000 UTC
). The metric value in that case would be in the order of years, and would completely skew the avg
, max
and p(95)
results. This happens on current main
as well, but it is less visible since FrameSession.initTime
would likely always be set to something, so some metrics would be slightly wrong in the order of ms or s. The workaround in 47d9e82 masks this by not emitting the metric when this happens, but the proper fix would be fixing the racy CDP event handling, maybe with the approach proposed here.
To illustrate this second behavior, see the following log:
8581:time="2022-04-01T14:52:44+02:00" level=trace msg="<- {\"method\":\"Page.lifecycleEvent\",\"params\":{\"frameId\":\"E33B7459A9457721D8347D0D6A98F0C0\",\"loaderId\":\"DB271DD551D011164B141C868C8A800F\",\"name\":\"init\",\"timestamp\":71075.049544},\"sessionId\":\"7749F7E34D6032878C09B4205D185FD7\"}" category="cdp:recv" elapsed="0 ms" goroutine=69
8582:time="2022-04-01T14:52:44+02:00" level=trace msg="<- {\"method\":\"Page.lifecycleEvent\",\"params\":{\"frameId\":\"E33B7459A9457721D8347D0D6A98F0C0\",\"loaderId\":\"DB271DD551D011164B141C868C8A800F\",\"name\":\"DOMContentLoaded\",\"timestamp\":71075.049894},\"sessionId\":\"7749F7E34D6032878C09B4205D185FD7\"}" category="cdp:recv" elapsed="0 ms" goroutine=69
8584:time="2022-04-01T14:52:44+02:00" level=debug msg="sid:7749F7E34D6032878C09B4205D185FD7 tid:49BF3B5EBB5DA3DF8ACD3E3D9D9AE921 fid:E33B7459A9457721D8347D0D6A98F0C0 event:\"DOMContentLoaded\"" category="FrameSession:onPageLifecycle" elapsed="0 ms" goroutine=179
8589:time="2022-04-01T14:52:44+02:00" level=debug msg="fid:E33B7459A9457721D8347D0D6A98F0C0 furl:\"\" m:browser_dom_content_loaded init:\"0001-01-01 00:00:00 +0000 UTC\" t:\"2022-04-01 14:52:44.049894 +0200 CEST\" v:9223372036854.775391" category="Frame:emitMetric" elapsed="0 ms" goroutine=179
8590:time="2022-04-01T14:52:44+02:00" level=debug msg="sid:7749F7E34D6032878C09B4205D185FD7 tid:49BF3B5EBB5DA3DF8ACD3E3D9D9AE921 fid:E33B7459A9457721D8347D0D6A98F0C0 event:\"init\"" category="FrameSession:onPageLifecycle" elapsed="0 ms" goroutine=179
8591:time="2022-04-01T14:52:44+02:00" level=debug msg="sid:7749F7E34D6032878C09B4205D185FD7 tid:49BF3B5EBB5DA3DF8ACD3E3D9D9AE921 fid:E33B7459A9457721D8347D0D6A98F0C0 initTime:\"2022-04-01 14:52:44.049544 +0200 CEST\"" category="FrameSession:onPageLifecycle" elapsed="0 ms" goroutine=179
Notice that:
On line 8581 we receive the CDP init
event for frame ID E33B7459A9457721D8347D0D6A98F0C0
, followed by DOMContentLoaded
immediately after. This is the very initial page load, even before about:blank
is loaded.
Then on line 8584 we process DOMContentLoaded
first, at which point Frame.initTime
is zero. Line 8589 shows this, and the very wrong 9223372036854.775391
metric value.
The init
event is processed after this, on line 8590, which is where we set the correct Frame.initTime
for subsequent events.
A related question might be whether we even want to emit metrics for these initial page loads and about:blank
. They're not related to the SUT, but maybe they would still be useful to measure browser performance? :thinking: We might want to tag them differently in that case, or should we just remove them altogether? Regardless, the race condition would still exist even on regular pages, so a proper fix is still needed.
While testing #117 with the script from this comment on
main
(54cd293), thebrowser_first_meaningful_paint
metric was reported with negativeavg
,min
andmed
values:The JSON output shows the following: