w3c / webdriver-bidi

Bidirectional WebDriver protocol for browser automation
https://w3c.github.io/webdriver-bidi/
351 stars 38 forks source link

Remove `scrollIntoView` #519

Closed jrandolf closed 10 months ago

jrandolf commented 1 year ago

This operation can be done on the client side without much effort.


Preview | Diff

jimevans commented 1 year ago

I’m opposed to removing this. The “easily can be handled client-side” would involve another round-trip to the remote end. For a user using a cloud provider for execution, that means full internet latency twice more than if the option exists. In my experience, the more capability we allow the user to have on the remote end, the better.

I’ve noticed an occasional tendency to think that since we give the user the ability to execute scripts remotely, then we’ve done all we need to do and not provide other features because that means more work for implementors. And this tendency can get even worse when we allow preload scripts, because then the answer is, “Just let the user set the right preload script, regardless of how complex that might be, and they’ll be fine.”

However, that also means the user has to be clever enough to run just the right script, and introduces multiple commands when one would do. Or worse, it forces client library authors to write complex workarounds and make them explain to their users (who are often less sophisticated than one might expect) why their solution is less performant than a competing technology.

The more things we can allow a single command to do at the remote end, the more capabilities we allow the remote end to do without multiple round trips, the better.

OrKoN commented 1 year ago

The roundtrip issue sounds more general and perhaps the we can solve it by other means (by batching multiple commands). I think adding all possible options to each of the commands does not scale too: for example, scroll into view can only happen in a single way defined in the spec and clients that want to scroll differently might still need an extra call anyway.

jrandolf commented 1 year ago

@jimevans I disagree that this will cause more round trip. Considering that you have to query for the element, why can't you position it at that time? I agree with your sentiments completely, but for this specific case, I struggle to see how useful it will be in practice.

Take this from a "photographer" perspective. When you take a picture of a subject, you (the client) would either position your camera or your subject to be within frame of the camera.

With scrollIntoView, we are doing this positioning on behalf of you (which is inherently not the task of a camera) which may or may not be the expected behavior. In either case, we face an issue of expectation and confusion; "how do I make scrollIntoView scroll correctly?" and "why is my object not being scrolled into view?" (which the user may not realize is being obscured by another element). This will lead to issues being filed for "implementation error" when in fact the user must thoroughly understand what "scrollIntoView" does. It'd be far simpler (occam's razor) to just have the user do all this positioning.

jrandolf commented 1 year ago

Since it's optional I don't think we need to remove it

The idea of "scrollIntoView" as a definable option is the fundamental problem. 'scrollIntoView' has a major caveat where element can be scrolled into view but still obscured by a larger element. Users may file an error saying this is not expected when it really is.

OrKoN commented 1 year ago

@jrandolf with or without the scroll into view call, the element can still be obscured and I don't see a big difference. So I don't see how scrollIntoView makes situation any worse. Note that if the element is obscured the screenshot is still what the user sees.

jimevans commented 1 year ago

@jrandolf But “querying for the element” can be done in ways other than executing script*, meaning that one would need to query for the element, then take the screenshot. Two commands.

*Or ought to be. Not having a separate “locate an element” command is a weakness of WebDriver BiDi at the moment, since it does not allow for giving the user any hope of a mechanism outside JavaScript for locating elements. I am holding out hope that we will be able to do so via the accessibility tree, for example. My ideal solution would make the element location mechanism extensible, but that’s a discussion better suited to #150.

jrandolf commented 1 year ago

@jrandolf with or without the scroll into view call, the element can still be obscured and I don't see a big difference. So I don't see how scrollIntoView makes situation any worse. Note that if the element is obscured the screenshot is still what the user sees.

The element is not obscured in this case if there is no scrollIntoView call. The user just hasn't positioned their element. The semantics is important.

jimevans commented 1 year ago

@OrKoN One of the things I’m discovering is that users want things to Just Work™. If we give them the options for a single command that works in the X% case (where X is some number above 80) and let the outliers customize the behavior as needed, that is better than forcing every user to completely customize their experience manually. I could go on at length about how developers wax rhapsodically about choosing certain non-WebDriver-based browser automation libraries over WebDriver-based ones because of the perception that they Just Work.

Anyway, I’ve said my peace that I believe removing the option to be a mistake, and don’t agree with it. I doubt I’ll have more of substance to add, and whatever comes of the decision, I doubt I’ll change my opinion.

jrandolf commented 1 year ago

One of the things I’m discovering is that users want things to Just Work™.

@jimevans It looks like, based on your arguments, you expect for screenshot to take a picture of the entire element when element clip and scrollIntoView is defined. This perspective is the reason why I think it's troubling to add scrollIntoView. The purpose of captureScreenshot is, at the moment, to capture the viewport. That's it. If an element clip is defined, it means capture and crop to the element, wherever it may be. Scroll into view is completely orthogonal to this feature. If we change the semantics of element clip to mean "capture the entire element", then all current implementations are insufficient.

Tl;dr, what's the goal of element clip? To capture the entire element? Or just clip the viewport to the element (if it's in the viewport). If it's the latter, then scrollIntoView is not relevant. If it's the former...well, scrollIntoView may help but it's certainly not going to work in all cases. In this case, I'd much rather have "Perform implementation-specific action bring element into view" always or propose another method like "BrowsingContext.captureNodeScreenshot` which really attempts to capture the entire element.

jgraham commented 1 year ago

I think once we have full document screenshots, the combination of a full document screenshot and an element clip would capture the entire element irrespective of scroll.

I don't think we can easily remove scroll into view for element clipped viewport screenshots, because that's how WebDriver classic behaves.

jrandolf commented 10 months ago

With full document screenshots available, can we remove scrollIntoView? For Webdriver, implementations can just call scrollIntoView with the element and then screenshot.

jgraham commented 10 months ago

@jimevans To what extent does this resolve your concern? I think the vast majority of element screenshot cases would be "full screenshot, clipped to the element", which would give you the full element irrespective of viewport, and you'd only need to do scrollIntoView if you specifically want to know whether the element is clipped by the viewport after scrolling (or to precisely implement classic semantics).

jimevans commented 10 months ago

@jgraham I guess it's fine, though to implement precise classic semantics, that means a second round-trip. I'll remove my objection to landing this PR. If users find that the screenshot behavior as defined here (and as differs from WebDriver Classic) is insufficient, I reserve the right to crow, "I told you so," to the rafters. :)

jrandolf commented 10 months ago

@jgraham @jimevans @OrKoN Please approve.

whimboo commented 10 months ago

Do we already have wpt tests that need an update or do we have to create new ones that allow capturing a screenshot outside of the viewport? @jrandolf

jrandolf commented 10 months ago

Do we already have wpt tests that need an update or do we have to create new ones that allow capturing a screenshot outside of the viewport? @jrandolf

They already exist here: https://github.com/web-platform-tests/wpt/pull/42804

whimboo commented 10 months ago

@jrandolf FYI @lutien can take care of the removal of scrollIntoView in wpt tests via https://bugzilla.mozilla.org/show_bug.cgi?id=1862649 if you haven't started on that yet.

jrandolf commented 10 months ago

@jrandolf FYI @lutien can take care of the removal of scrollIntoView in wpt tests via https://bugzilla.mozilla.org/show_bug.cgi?id=1862649 if you haven't started on that yet.

sgtm!