mozilla-mobile / android-components

⚠️ This project moved to a new repository. It is now developed and maintained at: https://github.com/mozilla-mobile/firefox-android
https://github.com/mozilla-mobile/firefox-android
Mozilla Public License 2.0
2.02k stars 472 forks source link

GeckoView - InputResultDetail syncronization issues #10585

Closed Mugurell closed 2 years ago

Mugurell commented 3 years ago

Seen on https://github.com/mozilla-mobile/fenix/issues/18451#issuecomment-874234291 - https://sortablejs.github.io/Sortable/

Having pull to refresh enabled will not allow a second reorder down of the items.

https://user-images.githubusercontent.com/11428869/125321803-bbec3480-e345-11eb-8940-92df236e4c6b.mp4

This is happening because with a log here in that particular scenario (that can be repeated for any number of times) the time from ACTION_DOWN and until we get a response from GeckoView about how that touch was handled is >100 ms (also seen ~500ms for this scenario).

This is a lot more than the usual ~50 ms (probably there is a bug to be fixed on GeckoView's side because this happens only on the second swipe down for which GV responds that it handled the touch, not the website as we'd expect) and because we need to be preemptive about pull to refresh, that functionality will start and in such prevent dispatching any new touch events to the child view - GeckoView. (It will actually receive ACTION_CANCEL because the parent is handling the touch).

If we are to wait until we are actually getting a response from GeckoView, like we do here https://github.com/mozilla-mobile/android-components/issues/10555 then pull to refresh won't be possible to activate for that touch. As we are dropping MotionEvents until waiting to get the response from GeckoView we'll drop some initial ones and then when trying to start the feature (based on GeckoViews response) we'd get an Got ACTION_MOVE event but don't have an active pointer id error. (That's why we need to be preemptive about it and start the feature, send it that initial ACTION_DOWN but then cancel the feature based on the response from pull to refresh, )if that reponse comes in a reasonable time, before actually starting to show the throbber.

I think this issue (and https://github.com/mozilla-mobile/android-components/issues/10555 also) show a big caveat in our implementation and the general feature as it has to work between GeckoView and clients: The same user touch is handled both in the client and in GeckoView and then both results have to be merged but any bigger delay will break pull to refresh / dynamic toolbar or even the browser (as it's a child View in the app).

Based on this and also other features linked here I'd like to start a discussion about the opportunity of rethinking these features and reimplementing them. The direction I propose is having both the pull to refresh and the dynamic toolbar be driven by GeckoView. Similar to the edge effect and how the dynamic toolbar was implemented in Fennec. This would ensure a consistent UX with no synchronization issues.

Asking @pocmo @agi for their opinion.

┆Issue is synchronized with this Jira Task

agi commented 3 years ago

Sorry I completely missed this, I put a reminder to take a look on monday.

agi commented 3 years ago

I think in general we should not take 500ms to respond to a ACTION_DOWN event, so that's definitely a bug.

The direction I propose is having both the pull to refresh and the dynamic toolbar be driven by GeckoView. Similar to the edge effect and how the dynamic toolbar was implemented in Fennec. This would ensure a consistent UX with no synchronization issues.

I'm not clear exactly what that would look like, are you suggesting GeckoView would be the one displaying the throbber/dynamic toolbar? even with that I'm not sure that would fix this problem, we don't really know more than the app does at the GeckoView level (and if we do we should find a way to expose it in the API IMO, but I want to understand what your proposal is)

Let me see if I'm understanding the problem: what is happening here is that when the user starts scrolling the screen we need to start the refresh throbber, if GeckoView comes back and says that the scroll action was handled, then we cancel the throbber action, is that correct?

My uneducated guess here is that there's a javascript handler that's making us take a long time to respond to onTouchEventForResult but I think if we already know there is a javascript handler then we don't need to execute it to know that the content is gonna handle the event so we can just short-circuit and return early?

@hiikezoe any ideas? Am I completely off base here?

hiikezoe commented 3 years ago

A quick update I can tell now is;

I think in general we should not take 500ms to respond to a ACTION_DOWN event, so that's definitely a bug.

This is not true. We have (APZ has) to wait for a reply from the main-thread (i.e. from the content's JS mainly) if there's any event listeners or some such which might be handled by APZ. For example, if the content has an event listener for touch events and the handler invokes preventDefault() then APZ shouldn't handle the event. There is a pref value how we wait for the response from the main-thread; https://searchfox.org/mozilla-central/rev/d3683dbb252506400c71256ef3994cdbdfb71ada/mobile/android/app/mobile.js#376

I will re-read all relevant comments later.

hiikezoe commented 3 years ago

My uneducated guess here is that there's a javascript handler that's making us take a long time to respond to onTouchEventForResult but I think if we already know there is a javascript handler then we don't need to execute it to know that the content is gonna handle the event so we can just short-circuit and return early?

This is exactly what we do in APZ. We are actully doing the short-circuit in such cases. What I am suspecting is the site in question does start listening touchmove events inside an event handler for touchstart events.

Anyways, I am suspecting with the current setup in A-C there's definitely a race condition in between when we set this inputResultDetail and when we query the result. With the setup we can't tell the queried result is the corresponding result of the given input event.

Mugurell commented 3 years ago

Anyways, I am suspecting with the current setup in A-C there's definitely a race condition in between when we set this inputResultDetail and when we query the result. With the setup we can't tell the queried result is the corresponding result of the given input event.

This is true. All these interactions are happening on Main and we cannot block until getting a response. The framework methods to pass MotionEvents through the Views chain and then wait on callback to update them means also an async nature.

With some logs around that code for when the bug reproduces I get:

09:49:07.009 org.mozilla.fenix.debug I/System.out: onTouchEvent: action: ACTION_DOWN
09:49:07.009 org.mozilla.fenix.debug I/System.out: onTouchEvent: ACTION_DOWN started
09:49:07.010 org.mozilla.fenix.debug I/System.out:  onTouchEvent: ACTION_DOWN ended
09:49:07.142 org.mozilla.fenix.debug I/System.out: onTouchEvent: action: ACTION_MOVE
09:49:07.142 org.mozilla.fenix.debug I/System.out: onTouchEvent: ACTION_MOVE started
09:49:07.142 org.mozilla.fenix.debug I/System.out:  onTouchEvent: ACTION_MOVE ended
09:49:07.158 org.mozilla.fenix.debug I/System.out: onTouchEvent: action: ACTION_MOVE
09:49:07.158 org.mozilla.fenix.debug I/System.out: onTouchEvent: ACTION_MOVE started
09:49:07.158 org.mozilla.fenix.debug I/System.out:  onTouchEvent: ACTION_MOVE ended
09:49:07.174 org.mozilla.fenix.debug I/System.out: onTouchEvent: action: ACTION_MOVE
09:49:07.174 org.mozilla.fenix.debug I/System.out: onTouchEvent: ACTION_MOVE started
09:49:07.174 org.mozilla.fenix.debug I/System.out:  onTouchEvent: ACTION_MOVE ended
09:49:07.190 org.mozilla.fenix.debug I/System.out: onTouchEvent: action: ACTION_MOVE
09:49:07.190 org.mozilla.fenix.debug I/System.out: onTouchEvent: ACTION_MOVE started
09:49:07.190 org.mozilla.fenix.debug I/System.out:  onTouchEvent: ACTION_MOVE ended
09:49:07.207 org.mozilla.fenix.debug I/System.out: onTouchEvent: action: ACTION_MOVE
09:49:07.207 org.mozilla.fenix.debug I/System.out: onTouchEvent: ACTION_MOVE started
09:49:07.207 org.mozilla.fenix.debug I/System.out:  onTouchEvent: ACTION_MOVE ended
09:49:07.223 org.mozilla.fenix.debug I/System.out: onTouchEvent: action: ACTION_MOVE
09:49:07.223 org.mozilla.fenix.debug I/System.out: onTouchEvent: ACTION_MOVE started
09:49:07.223 org.mozilla.fenix.debug I/System.out:  onTouchEvent: ACTION_MOVE ended
09:49:07.241 org.mozilla.fenix.debug I/System.out: onTouchEvent: action: ACTION_CANCEL
09:49:07.247 org.mozilla.fenix.debug I/System.out: updateInputResult: InputResultDetail $351 (Input handled by the browser. Content can be scrolled to top, bottom and can be overscrolled to left, right)

So it's quite probable there is a bug to be resolved somewhere in GV/Gecko.

hiikezoe commented 3 years ago

Though I don't quite understand the reason why we can't block, I am supposing we don't need to block, we can just defer it until we get the corresponding response from GV?

Also I am wondering how Chrome handles those cases?

Mugurell commented 3 years ago

In regards to the first part we don't really have where to block. For both the pull to refresh and the dynamic toolbar feature we are using framework apis to which we pass (or they simply get passed to) MotionEvents the same way they get passed to GeckoView. For pull to refresh it's especially hard to block/ignore updating the throbber. It expects the complete stream of MotionEvents, from ACTION_DOWN to ACTION_UP / CANCEL and if we are skipping the ACTION_DOWN event it will complain and not allow to start the functionality anymore for that touch.

For the toolbar we're using our own GestureDetector that receives MotionEvents ~at the same time as GeckoView (from here) and only use the inputResult from GeckoView to know if it is allowed to update the toolbar or not - like here where shouldScroll evaluates the current inputResultDetail.

This is why I was proposing for GeckoView to inform the app about the scrolls, flings, overscrolls and their distances. This would've been possible with the current APIs - NestedGeckoView is a NestedScrollingChild and has a NestedScrollingChildHelper through which it can inform about these kind of events but the current implementation is rather a shell and not really the base of the current functionality. Possibly because this APIs don't seem to yet exist in GV and it should also respond asap or we'd skip UI frames. If this would've been the case we wouldn't need the inputResultDetail or our own gesture detector and everything would be in sync.

But since we are in the process of moving to a new toolkit for building the UI maybe we can take this opportunity to think about supporting the new much simpler API for the same general flow described above - Compose - nestedScroll official documentation & complete sample.

If GeckoView would support these methods we'd probably be able to then plug (and play) any other Views we want, like a dynamic toolbar or a pull to refresh view, overscroll edge effect, etc which would entirely be driven by GeckoView - by the events and distances it informs us about.

hiikezoe commented 3 years ago

Thanks for the detailed explanation. I think now I understand what the underlying issue is. Though I am not sure such APIs should be in GeckoView or inside A-C, it sounds promising. Presumably it doesn't matter where the APIs are initially since it can be easily ported to A-C from GeckoView or vise versa. Anyways, to me you are the best person to implement the APIs. :)

Mugurell commented 3 years ago

Would very much like to work on this! But we need support from GeckoView in exposing the scroll data.

The app is only interested in how the page moved (not really in what the user gestured on screen) and would be awesome if it didn't have to try to infer this separate from GV based on the same MotionEvents. For example it was very hard for us to differentiate between zoom and panning gestures (especially the quick zoom with one finger). Zoom gestures should not update the toolbar/pull to refresh but panning should. This functionality already exists in GV but we needed to duplicate it in AC.

If GV would expose a stream of scroll data (maybe following that same model for informing how the user scroll was consumed or not

) we can easily adapt to conform to the new interfaces in a NestedGeckoViewV2. (Maybe fling/overscroll data can be added in a similar way)

I get that is is probably a big change (although I hope all the work until now is reusable and probably all the data is already there) and this is why I wanted to first ask if something like this can be considered with implementation details to be later discussed.

hiikezoe commented 3 years ago

One caveat I can tell is the amount of scroll pixels is not what you want. So for example if the content is scaled by 0.5x, the scroll amount is 2x larger than the value on 1.0x scaled content in the content's coords but the delta on the screen will be same.

Mugurell commented 3 years ago

Makes sense. But if the scaling is known to GeckoView maybe it can use that in the pixels scrolled result?

agi commented 3 years ago

If GeckoView would support these methods we'd probably be able to then plug (and play) any other Views we want, like a dynamic toolbar or a pull to refresh view, overscroll edge effect, etc which would entirely be driven by GeckoView - by the events and distances it informs us about.

I need to look into this more, but from a first glance this sounds like something I would support having in GeckoView. I agree the complexity of this whole feature is really high and it would be nice if we could hide it from AC/Fenix as much as possible.

agi commented 3 years ago

There is a pref value how we wait for the response from the main-thread; https://searchfox.org/mozilla-central/rev/d3683dbb252506400c71256ef3994cdbdfb71ada/mobile/android/app/mobile.js#376

ouch so we wait as much as 600ms? I think that's really high, especially on powerful devices. I wonder if we could lower it a little bit

hiikezoe commented 3 years ago

There is a pref value how we wait for the response from the main-thread; https://searchfox.org/mozilla-central/rev/d3683dbb252506400c71256ef3994cdbdfb71ada/mobile/android/app/mobile.js#376

ouch so we wait as much as 600ms? I think that's really high, especially on powerful devices. I wonder if we could lower it a little bit

As you may have already noticed, the value is 400ms on desktop it's not so much difference.

But if the scaling is known to GeckoView maybe it can use that in the pixels scrolled result?

The delta in the screen coords can be calculated in GeckoView or AC (Fenix), you can just ignore the content scale.

I am assuming you are supposing something like this;

1) the user moves his finger 100px in the screen coords 2) in the content coords, the first 50px scrolls the content, but it reached to one of the edge, thus the rest of 50px didn't cause any scroll

In this case, the browser should do nothing. In other words, once after the content, or Gecko started consuming touch events, the browser should do nothing until the user leaves his fingers (or the events is interrupted by other type of events). Of course it's likely there's any edge cases I am not aware of.

csadilek commented 2 years ago

Moved to bugzilla: https://bugzilla.mozilla.org/show_bug.cgi?id=1795420

Change performed by the Move to Bugzilla add-on.