w3c / webextensions

Charter and administrivia for the WebExtensions Community Group (WECG)
Other
581 stars 50 forks source link

Inconsistency: Alarms API time slipping #433

Open bershanskiy opened 11 months ago

bershanskiy commented 11 months ago

TL;DR

Alarms API alarm timing is not guaranteed when devices go to sleep.

Details

The documentation on Alarms API implies/suggests that the alarms internally uses absolute timestamps, fire nearly exactly at specified timestamps, and are independent from one another. In reality, the situation is a bit different:

This gives a rise to a few curious behaviors:

patrickkettner commented 11 months ago

Hi @bershanskiy! Are you intending to file a bug? Or are you wanting looking to standardize certain behaviors?

bershanskiy commented 11 months ago

Are you intending to file a bug?

There is a bug filed (I need to find it).

Or are you wanting looking to standardize certain behaviors?

Ideally, yes, there should be a standard reasonably simple behavior for all browsers. Unfortunately, extensions may have adopted to the current (in my opinion slightly odd) behavior and changing current implementation details might introduce some incompatibilities.

xeenon commented 11 months ago

Related to #406 and #422.

xeenon commented 11 months ago

Safari calculate alarms by always using wall clock no matter what type of alarm. And repeating alarms are off that original first fire wall clock, not chained from when it fires.

bershanskiy commented 11 months ago

Some more Chromium-specific context is available here: https://github.com/darkreader/darkreader/pull/11307#issuecomment-1601488291

The following Chromium patch replaces base::OneShotTimer (which is affected by time slipping) with base::WallClockTimer (which is specifically written to adjust for time slipping, it uses base::OneShotTimer and re-adjusts it after every time slipping event).

diff --git a/extensions/browser/api/alarms/alarm_manager.cc b/extensions/browser/api/alarms/alarm_manager.cc
index 48d7f83388..c008a5a090 100644
--- a/extensions/browser/api/alarms/alarm_manager.cc
+++ b/extensions/browser/api/alarms/alarm_manager.cc
@@ -357,7 +357,7 @@ void AlarmManager::ReadFromStorage(const std::string& extension_id,

 void AlarmManager::SetNextPollTime(const base::Time& time) {
   next_poll_time_ = time;
-  timer_.Start(FROM_HERE, std::max(base::Seconds(0), time - clock_->Now()),
+  timer_.Start(FROM_HERE, time,
                this, &AlarmManager::PollAlarms);
 }

diff --git a/extensions/browser/api/alarms/alarm_manager.h b/extensions/browser/api/alarms/alarm_manager.h
index c23f78a457..1f223f9eb7 100644
--- a/extensions/browser/api/alarms/alarm_manager.h
+++ b/extensions/browser/api/alarms/alarm_manager.h
@@ -19,6 +19,7 @@
 #include "base/scoped_observation.h"
 #include "base/time/time.h"
 #include "base/timer/timer.h"
+#include "base/timer/wall_clock_timer.h"
 #include "extensions/browser/browser_context_keyed_api_factory.h"
 #include "extensions/browser/extension_registry.h"
 #include "extensions/browser/extension_registry_observer.h"
@@ -245,7 +246,7 @@ class AlarmManager : public BrowserContextKeyedAPI,
       extension_registry_observation_{this};

   // The timer for this alarm manager.
-  base::OneShotTimer timer_;
+  base::WallClockTimer timer_;

   // A map of our pending alarms, per extension.
   // Invariant: None of the AlarmLists are empty.
bershanskiy commented 11 months ago

The following describes experiment in Chrome and in Firefox, and then provides explanation for this behavior in Chrome.

Experiment 1

Consider an empty extension which just has alarms permission (so it can schedule alarms), a background script/page (so that background can be inspected) and nothing else. Open the developer tools inspecting the background.

Run this to log all alarms as they fire (alarm and the moment it fires):

chrome.alarms.onAlarm.addListener((a) => console.log(Date.now(), a))

Then create one alarm which is supposed to fire in n minutes, where n > 5:

chrome.alarms.create('test', {when: Date.now() + 1000 * 60 * 5});

Then put your laptop to sleep (best to activate sleep mode explicitly)

Then wait for m minutes where m > 2 and wake up the computer and wait for the alarm to fire.

Calculate alarm delay which is the difference of logged of Date.now() and alarms's scheduledTime. Observe that this difference is slightly more than m and does not depend on n.

Experiment 2

Do all the steps in experiment 1 until waking up, and then after wake up run

chrome.alarms.create('reset', {when: 0});

Observe that reset alarm is fired immediately (as it should). Then wait for test alarm to fire and notice that the delay is significantly less than 50 ms.

Experiment 3

This time install two extensions like this (just use a different extension file path and name in manifest to differentiate the two).

Register the alarm from the experiment 1 in one extension, then suspend and wake up, then schedule reset alarm in the second extension. Observe that the first extension alarm does not drift now.

Explanation for Chrome

Alarms API is implemented by AlarmManager, which is created for each profile (shared by all extensions in the same profile). This class uses base::OneShotTimer[1] which may drift when computer goes to sleep[2]. Upon startup and every alarm, AlarmManager calculates the next upcoming alarm and schedules the timer. This means:

  1. Extension can fix drift by registering an already expired alarm.
  2. Alarms can not drift by more than the difference between earliest alarm's scheduled time and the time computer went to sleep.
  3. Upon an alarm from base::OneShotTimer, AlarmManager fires all expired alarms, if there were any. If there is a single one, only it fires. If there were multiple expired alarms, then all of them fire.

[1] https://github.com/chromium/chromium/blob/main/extensions/browser/api/alarms/alarm_manager.h#L248 [2] https://github.com/chromium/chromium/blob/main/base/timer/wall_clock_timer.h#L27

dotproto commented 11 months ago

During today's meeting I believe we identified two main questions that need to be resolved.

  1. What is the intended behavior of delayInMinutes and when?
  2. Should when be changed to always use wall clock time rather than relative time?

With regard to Question 1, I see two possible interpretations of the intent of delayInMinutes and when.

If the platform only supported one of these behaviors, developers could implement their own solutions to provide the other behavior. That said, I believe shimming B would be more resource intensive to implement in userland.

xeenon commented 10 months ago

I do worry Interpretation B is too subtle, and can be a footman for developers that don't know the difference.

oliverdunk commented 8 months ago

At TPAC, we decided we wanted to look at the behaviour of setTimeout and setInterval for past precedence.

setTimeout seems to be in the spec here, which links to timer initialization steps, which links to run steps after a timeout.

Here it says the following:

If global is a Window object, wait until global's associated Document has been fully active for a further milliseconds milliseconds (not necessarily consecutively).

Otherwise, global is a WorkerGlobalScope object; wait until milliseconds milliseconds have passed with the worker not suspended (not necessarily consecutively).

Fully active is defined as follows:

A Document d is said to be fully active when d is the active document of a navigable navigable, and either navigable is a top-level traversable or navigable's container document is fully active.

It's unclear if a document is "fully active" or a worker is "not suspended" when the device is asleep.

Looking at the HTML spec repo, there is an issue discussing this problem. It seems like the behaviour is usually to use monotonic (non wall-clock time) but it is platform specific. We saw similar results testing at TPAC although Devlin did seem to see wall-clock time being used at one point - it's unclear what happened there.

TLDR: It doesn't seem like there's a lot of clarity around this on the web either. I'll ask on the Google side to check the above summary is accurate, but I'm not sure that's going to give us an easy answer unfortunately.

oliverdunk commented 8 months ago

I spoke to some engineers on our side and confirmed that the behaviour documented here matches our understanding (all browsers seem to pause timers on sleep except on Windows). There isn't much consensus beyond that and this isn't covered by the spec.

Since the most common implementation for setTimeout and setInterval is to pause on sleep, we could try to match that. However, that would mean that when would need to either be an exception to the rule or also use relative time and potentially be confusing to developers.

Personally, I'm currently leaning towards always using wall clock time. That would be the opposite of what the web seems to do most often but would mean we could use the same clock for when and delayInMinutes.

For repeating alarms that have fired more than once while a device was asleep, firing multiple seems undesirable so a good option could be firing once when the device wakes up (the explanation for this would be that the next instance only gets scheduled when the previous one fires).

All of this is just my thoughts and thinking out loud though 🤷. I haven't spoken to the wider Chrome team for thoughts, although at TPAC we seemed generally open to anything reasonable that we could agree on.

oliverdunk commented 7 months ago

Following our latest public meeting and some discussion async, everyone seems aligned on the following behaviour:

@zombie, @xeenon - can you add respective labels?

xeenon commented 7 months ago

Safari / WebKit bug: https://bugs.webkit.org/show_bug.cgi?id=265583

dotproto commented 6 months ago

I just updated the WHATWG issue that Oliver linked with the following comment:

We've been discussing a similar issue with the Alarms API in the WebExtensions Community Group. Chrome originally introduced the Alarms API as a way to perform setTimeout() and setInterval()-like operations in non-persistent background pages (also known as "event pages" and "lazy background pages"). As such, the original intent of the Alarms API is that it would behave like the web platform methods.

As we discuss inconsistent behavior across browsers, bugs in existing implementations, and developer expectations, we are reconsidering how alarms should behave across sleep/suspend boundaries.

The current consensus opinion is that time should continue to tick for alarms while a device is suspend. We commonly refer to this as using wall-clock time. For example, say it is currently 9:00 AM and an extension schedules an alarm for 9:10 AM. At 9:05 the user suspends the device and unsuspends it at 9:15. When the device wakes, the browser should dispatch events for alarms that were scheduled when the device was asleep. This means the 9:10 alarm will fire at 9:15. (If time did not dick while the device was alseep, the alarm event would not be dispatched until 9:20.)

One of the considerations that lead us in this direction is that the Alarms API has more metadata about the scheduled operation than the web platform APIs. We can see this in the browser.alarms.onAlarm event handler, which receives an Alarms object that contains the following properties:

  • name - either "" or another string provided at alarm creation
  • scheduledTime - the time at which the alarm was scheduled to fire
  • (optional) periodInMinutes - the period at which the alarm will repeat (only included if set at alarm creation)

Browser representatives participating in the WECG felt that this provides the extension developer with enough information to distinguish between alarm events if multiple fire when a device resumes from sleep/suspend.

I wanted to share it here as I think it neatly summarizes my understanding of our current thinking.

oliverdunk commented 2 months ago

Repeating alarms always fire on a schedule to prevent drift (for example, if your device is asleep and you miss an alarm at 1AM, the next fire will be at 2AM even if the previous one isn't handled until 1:05AM).

@bershanskiy, I was briefly looking at the Chromium implementation and it seems like we do have some logic for this here. However, you mentioned in a CL a while back that you thought Chromium still violated it.

Is it possible none of us had noticed that code existed or is there still a reason why this isn't the case in Chromium?