mozilla-mobile / firefox-ios

Firefox for iOS
Mozilla Public License 2.0
12.2k stars 2.92k forks source link

JS alert completion handler crashes and API updates #22650

Open data-sync-user opened 5 days ago

data-sync-user commented 5 days ago

This ticket captures several action items to address JS alert crashes (in production since at least v127, and likely before), and also a bug in our JS completion handlers causing website scripts to execute incorrectly.


Context

For important context please see:

https://mozilla-hub.atlassian.net/browse/FXIOS-10315

https://mozilla-hub.atlassian.net/browse/FXIOS-10326

and also related Slack threads:

https://mozilla.slack.com/archives/C05C9RET70F/p1729096888361509

and 10/18 incident channel: #incident-ios-crash-increase


Summary

  1. Fix incorrect MessageAlert completion handler
    1. Completion handler should be called after the alert is dismissed (not presented or enqueued)
    2. This bug currently causes Firefox iOS to execute fundamental website Javascripts incorrectly, see: https://mozilla-hub.atlassian.net/browse/FXIOS-10359
  2. Fix crashes in known test cases (listed below)
    1. Sub-action item: move the JS code for test cases 4-6 from ww3schools to a more convenient test page
  3. Identify the cause of the crash increase in 131.3
    1. Known crash test cases were actually fixed, meaning we improved stability in one set of scenarios but decreased it in others (which require add’tl investigation)
    2. We should be able to glean some clues from 131.3 Sentry breadcrumbs and logs
  4. Remove JS alert dequeue called from BVC viewDidLayoutSubviews
    1. This UI func is not a good place to be performing an alert dequeue
    2. Unclear why it was added there; needs additional investigation

Caution needed ❗

We have implemented QA-validated fixes for these crashes in the past, and counter-intuitively this led to an increase in related crashes in Production. Unfortunately it seems that this code is extremely brittle, and so we need to use care as part of this ticket to ensure we don’t once again cause a dip in crash-free sessions.


Known test cases / crashes

TEST CASE 1

Steps logged in this Bugzilla (below) produce a crash in the JS alert:

  1. Visit https://mattreaganmozilla.github.io/test
  2. Click ‘Broken’ button

https://bugzilla.mozilla.org/show_bug.cgi?id=1922283

TEST CASE 2

  1. Visit https://mattreaganmozilla.github.io/test
  2. Tap on ‘Two alerts with delay’
  3. Have another unrelated alert (UIAlertController) presented
  4. Wait for the delay to pass, and the JS alerts to be enqueued
  5. Rotate the device

Result: the rotation triggers a viewDidLayoutSubviews which will actually dequeue an alert and leads to a crash since we’re not checking whether we can present the alert safely.

TEST CASE 3

  1. Visit https://mattreaganmozilla.github.io/test
  2. Tap (or try tapping twice, if needed) on ‘Two alerts with delay’
  3. Switch to another tab, wait 5 seconds, then switch back

(Result: crash, note that this specific crash may only repro in 131.1 & 131.2 but we need to regression test this scenario anyway in any builds from 132+)

TEST CASES 4-6

NOTE: These test cases are running the JS through ww3 schools, as one of the action items I am planning to move these scripts to the test page at https://mattreaganmozilla.github.io/test for convenience, will update the repro steps as needed once that is done.

Please note that we do not reproduce this issue if we run the alerts if opening w3schools > access tab tray (wait for the alert to be displayed) > Close the w3school tab > no crashes.

We are still reproducing this issue following the 3 cases below: Devices: iPhone 14 Pro (16.2), iPhone 15+ (18), iPhone 15 Pro (17.5), iPhone 15 (17.6).

{{

The Window Object

The alert() Method

Click the button to display an alert box after 5 seconds.

}}

Steps:

  1. Open https://www.w3schools.com/html/tryit.asp?filename=tryhtml_default_default
  2. Copy and Paste the Alert code and then press run
  3. Scroll down and tap on Try it
  4. Fast open the tabs tray and open a new tab (can be the homepage)
  5. Go back to the tabs tray.
  6. Tap on the x button and close the w3schools website first and then the remaining homepage tab.

Actual results: FF crashes when you first close the w3schools tab, or when closing the 2nd opened tab (even if it was a homepage).

<!DOCTYPE html> <html> <body> <h1>The Window Object</h1> <h2>The alert() Method</h2> <p>Click the button to display an alert box after 5 seconds.</p> <button onclick="myFunction()">Try it</button> <script> function myFunction() { // Delay the alert for 5 seconds (5000 milliseconds) setTimeout(function() { confirm("this is a confirm message"); }, 2000); } </script> </body>

Steps:

  1. Open https://www.w3schools.com/html/tryit.asp?filename=tryhtml_default_default
  2. Copy and Paste the Alert code and then press run
  3. Scroll down and tap on Try it
  4. Fast open the tabs tray and open a new tab (can be the homepage)
  5. Go back to the tabs tray.
  6. Tap on the x button and close the w3schools website first and then the remaining homepage tab.

Actual results: FF crashes when you first close the w3schools tab, or when closing the 2nd opened tab (even if it was a homepage). 

<!DOCTYPE html> <html> <body> <h1>The Window Object</h1> <h2>The alert() Method</h2> <p>Click the button to display an alert box after 5 seconds.</p> <button onclick="myFunction()">Try it</button> <script> function myFunction() { // Delay the alert for 5 seconds (5000 milliseconds) setTimeout(function() { prompt("test prompt"); }, 2000); } </script> </body>

Steps:

  1. Open https://www.w3schools.com/html/tryit.asp?filename=tryhtml_default_default
  2. Copy and Paste the Alert code and then press run
  3. Scroll down and tap on Try it
  4. Fast open the tabs tray and open a new tab (can be the homepage)
  5. Go back to the tabs tray.
  6. Tap on the x button and close the w3schools website first and then the remaining homepage tab.

Actual results: FF crashes when you first close the w3schools tab, or when closing the 2nd opened tab (even if it was a homepage).

Added the crash logs.

┆Issue is synchronized with this Jira Bug

data-sync-user commented 3 days ago

➤ Matt Reagan commented:

Follow-up note: fixes were previously implemented for test cases 1-3 (includes Bugzilla) as well as correcting the API misuse of the MessageAlert completion handler (see ticket description); these updates were originally merged across 131.1-3.

However, our crash rate in production went up, which means that some other crash scenario (which was occurring more frequently than the crashes we fixed) was actually exacerbated. The fixes were subsequently reverted in 131.4.

Orla mentioned during the 10/18 incident discussions that this area of the codebase had caused similar problems some time ago and she and Yoanna had been working in this area, and it was especially brittle. Orla Mitchell sending your way since based on the 10/18 chats it sounds like you may have specific ideas for how you’d like to implement these changes.