Update research reference to more current research

nhelfman commented 2 years ago

Research link in https://w3c.github.io/event-timing/#sec-intro points to relatively old research from Miller 1968 and Card 1991.

A more recent paper from 2017 covers additional research (on top of the one mentioned) can be linked to provide more weight and details to the latency guidelines.

https://link.springer.com/chapter/10.1007/978-3-319-58475-1_1

I propose updating the research link.

yoavweiss commented 2 years ago

^^ @mmocny

@nhelfman - updating the research we're linking to sounds great! (even if I haven't yet read through the one you linked) Are you interested in sending a PR updating the research we're linking to?

mmocny commented 2 years ago

Thanks for noticing this, and for sharing the link.

Sharing some of my own notes, in case others find useful...

The research starts with a summary of previous findings which I find useful. There is more detail about how these studies compare.

I especially find some of the conclusions from some of the "Recent Latency Guidelines" interesting -- e.g.

According to Seow [29], users have certain expectations regarding the responsiveness of the system if a certain task is conducted. For instance, tasks that mimic events in the physical world with instantaneous responses (e.g., pressing a virtual button which mimics pressing a physical button) should also show instantaneous responses (e.g., an audible click). For this very basic kind of task, the user expects the system to respond instantaneous, which means that a maximum SRT of 100 ms is required for very simple feedback (e.g., audible click after a virtual button press), respectively 200 ms for slightly more complex feedback (e.g., visual drop down menu). The next category, labelled “immediate”, concerns situations in which the user expects the system to respond by performing an action initiated by the user (e.g., the display of a letter after a keystroke) and requires a maximum SRT of 500–1000 ms

There are other sections that references multiple responses (both bimodal i.e. visual audio tactile, or simply initial feedback vs feedback after complex task completion). I think this matches our thinking in this space (i.e. distinguishing the very first/next paint after an interaction, from a potential largest/final update).

I wanted to call out this result in particular, which I find is often cited as proof of more aggressive latency targets:

According to the guidelines by Kaaresoja et al. [19], latencies for visual feedback should lie between 30–85 ms, for audio feedback between 20–70 ms and for tactile feedback between 5–50 ms. Hence, their guidelines were the first to explicitly incorporate latencies smaller than 50–100 ms, if only for a very specific use case.

But then I'll contrast it to another finding from the same research:

Significant drops in the perceived quality scores were found at 100–150 ms for visual, and 70–100 ms for audio as well as for tactile feedback. Moreover, buttons with any feedback with a 300 ms latency were rated significantly lower than the buttons with any feedback with latencies ranging from 0 to 150 ms.

In other words, while the overall guidelines seem quite aggressive (5-85ms range for appearing instantaneous), they still share overall findings that quality scores start to perceivably drop between the 150ms and 300ms thresholds. I at least hadn't read that distinction before.

Some of the results for bimodal / tactile feedback are interesting and suggest very aggressive latency expectations. I wonder if this is is something that is at least partially handled at the O/S level. I.e. my device / virtual keyboard will automatically vibrate even if the keyboard handler has very high latency.

There are findings that suggest more aggressive target latencies for continuous events (using a pen/stylus to "ink", or dragging with finger, or as I've read in other studies: using a mouse to pan screen in a video game). We should be careful to not mix these expectations with discrete events like taping a button, since perceived latencies are consistently higher there across studies.

E.g.:

Moreover, interaction speed affects latency perception in dragging tasks. The faster the user’s hand motion in a dragging task, the better the latency perception [26]

I think it is interesting to make a comparison between Performance and Perception:

Using a virtual balance task, it has been shown that performance was already impaired by an added latency of 49 ms (technical base latency: 10.8 ms). However, participants perceived only the added latency from 97 ms on. Hence, even though users were not able to perceive the latency, it had an effect on their performance.

The effect of latency on user performance was also examined more closely in recent years. For instance, Brady et al. [4] applied an indirect mouse movement task and found that an added latency of 33 ms significantly impaired user performance. In a pointing task, latency began to affect performance at 16 ms [14]. In a 3D game environment, a latency of 41 ms impaired user performance in an aiming task [16].

And finally, some conclusions at the end:

However, latency can also impair user performance and experience in first- and second-order tasks. Especially in the emerging field of human-robot-interactions, virtual environments and remote-controlled systems, influences of latency should be further investigated in more complex tasks.

I think this is very interesting-- however, I would be careful to apply a broad recommendation that has a latency target which is 2x more aggressive than is actually claimed perceptible to users.

Interpreting conclusions:

To conclude, while several design guidelines recommend a maximum latency of 100 ms for an optimal user experience in basic interactions, empirical results suggest that latency thresholds for different tasks lay substantially lower. Users are indeed able to perceive latencies down to single milliseconds in specific tasks. Moreover, performance in zero-order and more demanding second-order tasks already gets impaired by latencies between 16–60 ms. Therefore, the lower boundary of 100 ms as mentioned in several design guidelines appears outdated. Especially interactions that are very similar to physical interactions require substantially smaller maximum acceptable latencies. Furthermore, several factors affect latency perception and consequently user performance and tolerance. Hence, a need for updated, evidence-based latency guidelines incorporating system-, task-, and person characteristics emerges.

and

Moreover, with technical progress aiming at increasingly reducing latencies, users likely get accustomed to hardly perceivable delays. This could lead to a higher sensitivity for very short latencies in users with much experience with such modern systems and is probably one factor why guidelines from the 20th century are not applicable anymore.

The evidence provided in this paper (and all the referenced research) is incredibly useful and interesting. However, a point I am left with is that: 100ms as a broad target remains fairly consistent, even in modern studies. The more aggressive requirements are typically limited to specific populations or specific use cases.

For Event Timing, and the Responsiveness metrics we make measure it (FID & the new experimental responsiveness metric), we are self-selecting for use cases that fall far closer to that broad category (i.e. discrete input only, heavily focused on visual feedback, typical web-platform UX use cases).

nhelfman commented 2 years ago

^^ @mmocny

@nhelfman - updating the research we're linking to sounds great! (even if I haven't yet read through the one you linked) Are you interested in sending a PR updating the research we're linking to?

@yoavweiss I'll to see if I can get to it and issue a PR (assuming we agree the new link is better).

clelland commented 1 year ago

@nhelfman, has there been any work towards a PR?

w3c / event-timing

Update research reference to more current research #118