Make Fathom extraction more performant

TL;DR: isVisible, which will likely be a common rule in many Fathom applications, accounts for the majority of the 460 ms of Fathom-related jank (67%). It may be possible to reduce this overall Fathom jank by as much as 374 ms (81%) by reducing style and DOM property accesses in isVisible, but this solution requires a privileged context.

Is this a feature request or a bug?

Feature request

What is the current behavior?

Fathom currently causes considerable jank on page load. Given that Price Tracker was an experiment, performance was not a major consideration in development of the current ruleset used to extract a product from a page.

Initial profile

Initial profile[1]

Numbers

The largest unresponsive chunk is caused by Fathom extraction, contributing 460 ms of jank right around page load.
isVisible calls Fathom’s ancestors function and a bunch of CSS2Properties getters, altogether taking up 311ms (67%) of this jank.

Screen Shot 2019-07-16 at 8 33 26 PM

isVisible(fnode) {
  for (const ancestor of ancestors(fnode.element)) {
    const style = getComputedStyle(ancestor);
    const isElementHidden = (
      style.visibility === 'hidden'
      || style.display === 'none'
      || style.opacity === '0'
      || style.width === '0'
      || style.height === '0'
    );
    if (isElementHidden) {
      return false;
    }
  }
  return true;
}

/**
 * Yield an element and each of its ancestors.
 */
export function *ancestors(element) {
    yield element;
    let parent;
    while ((parent = element.parentNode) !== null && parent.nodeType === parent.ELEMENT_NODE) {
        yield parent;
        element = parent;
    }
}

What is the expected or desired behavior?

Fathom should be able to run much faster with much less jank.

I shared this profile with some Firefox engineers, including a layout (Emilio) and a performance engineer (Greg).

Why is isVisible taking so much time?

There is a known layout performance bug that causes getComputedStyle() on a display: none subtree to be very slow.
- Fix: Avoid checking display: none.
For each element, we are walking the ancestor chain. This means we are doing redundant work for many elements (e.g. for n visible descendants of <html>, we will check <html>'s styles n times).
- Fix: Mitigate redundant work and/or memoize isVisible result (e.g. in an HTML element => Boolean Map or a note()).
getComputedStyle triggers a layout flush (not a natural layout flush).
- Fix: Use the cheapest style accesses first (e.g. getBoundingClientRect) and early return when possible. After a flush, some style accesses are O(1).
- Fix: Leverage a natural flush rather than triggering a new flush using the Intersection Observer API for unprivileged and/or promiseDocumentFlushed for privileged code.

Updating `isVisible`: first attempt

Revised profile: first attempt[1]

This focuses on the fix of using the cheapest style accesses first (e.g. getBoundingClientRect) and early return when possible in isVisible.

Numbers

The largest unresponsive chunk is still caused by Fathom extraction, contributing 386 ms of jank right around page load.
isVisible made up 244ms (63%) of this jank.
This change reduced overall Fathom-related jank by 74 ms (16%) compared to the original implementation of isVisible.

// Avoid reading `display: none` due to Bug 1381071
isVisible(fnode) {
  const element = fnode.element;
  if (element.getBoxQuads().length === 0) {
    // The element has no box (display: none subtree?), checking
    // getBoundingClientRect instead for a width and height of 0 only tells
    // us it is a zero-sized box.
    return false;
  }
  const eleStyle = getComputedStyle(element);
  if (eleStyle.visibility === 'hidden') {
    return false; // not painted.
  }
  // This is assuming inclusive ancestors, otherwise we need to check opacity
  // above too.
  for (const ancestor of ancestors(element)) {
    const style = getComputedStyle(ancestor);
    if (style.opacity === '0') {
      return false;
    }
    if (style.display === 'contents') {
      // display: contents elements have no box themselves, but children are
      // still rendered.
      continue;
    }
    const rect = ancestor.getBoundingClientRect();
    if ((rect.width === 0 || rect.height === 0) && style.overflow === 'hidden') {
      // This is a zero-sized box; zero-sized ancestors don’t make descendants
      // hidden unless the descendant has overflow: hidden
      return false;
    }
  }
  return true;
}

Conclusions

While this removed much of the layout querying work, the proportion of jank caused by style accesses was smaller than expected, given the known performance Bug 1381071.
Now isVisible is spending most of its time (roughly 62%) accessing properties through X-ray vision. We can inspect this in the profile by filtering for Native code for "js::NativeLookupOwnProperty", "XRayTraits" or "WeakMap".
- Fix: Spend less time accessing DOM properties. Consider simply checking whether the element is clickable. This requires more work from the browser, but less complexity in isVisible.

Updating `isVisible`: second attempt

Revised profile: second attempt[1]

This focuses on the fix of mitigating (actually, eliminating) redundant work, removing ancestors entirely to eliminate DOM accesses, and reducing the number of style accesses by checking if the element is clickable.

Numbers

The largest unresponsive chunk is no longer caused by Fathom extraction; Fathom extraction contributes only 89 ms of jank right around page load.
isVisible made up 30 ms (34%) of this jank.
This change reduced overall Fathom-related jank by 374 ms (81%) compared to the original implementation of isVisible.

  isVisible(fnode) {
    const element = fnode.element;
    const rect = element.getBoundingClientRect();
    return !!(rect.height && rect.width && (document.elementsFromPoint(rect.x
    + rect.width / 2, rect.y + rect.height / 2)).includes(element));
  }

Conclusions

DOM property accesses were by far the largest contributor to jank (81% compared to 16% for style accesses in the previous profile). This is largely due to the X-Ray work associated with DOM property access.
Unfortunately, as implemented, this misses some elements that can be visible but reside outside of the viewport in the -x and/or -y directions due to how getBoundingClientRect and elementsFromPoint work.
- Per Gijs and Emilio, in privileged code, there is a way to cope with out-of-viewport coordinates using aIgnoreRootScrollFrame, so this solution may still be viable in Firefox applications. This needs further investigation.

Version information

Firefox version: 70
Your OS and version: Mac OSX 10.13.6
Price Tracker extension version: PR #317

[1]: Profiled on a locally hosted product page with specs from the "Version information" section in this comment.

Note: As is, isVisible has a bug, since getComputedStyle().width and getComputedStyle().height return values like "0px" and such, not "0" (thanks Emilio).

Also, it may include elements for consideration that are offscreen and therefore not practically visible to the user. Fathom's isVisible function handles this case, though it suffers from the same error as above since it originated from Price Tracker's implementation.

Fathom v3.1.0 isVisible implementation

/**
 * Return whether an element is practically visible, considing things like 0
 * size or opacity, ``display: none``, and ``visibility: hidden``.
 */
export function isVisible(fnodeOrElement) {
    const element = toDomElement(fnodeOrElement);
    for (const ancestor of ancestors(element)) {
        const style = getComputedStyle(ancestor);
        if (style.visibility === 'hidden' ||
            style.display === 'none' ||
            style.opacity === '0' ||
            style.width === '0' ||
            style.height === '0') {
            return false;
        } else {
            // It wasn't hidden based on a computed style. See if it's
            // offscreen:
            const rect = element.getBoundingClientRect();
            const frame = element.ownerDocument.defaultView;  // window or iframe
            if ((rect.right + frame.scrollX < 0) ||
                (rect.bottom + frame.scrollY < 0)) {
                return false;
            }
        }
    }
    return true;
}

Probably what this issue boils down to is to use the isVisible function in the Fathom library, once that has been updated with a more performant and complete version.

Updating `isVisible`: third attempt

(continued from the first comment in this thread)

Revised profile: third attempt[1]

This implementation of isVisible focuses on the fixes of leveraging natural flushes by using the Intersection Observer API and caching results.

Numbers

The largest unresponsive chunk is still caused by Fathom extraction, contributing 366 ms of jank right around page load.
isVisiblePreprocessing and isVisible made up 34 ms (9%) of this jank, though I think suboptimal aspects of this implementation added jank in other places.
This change reduced overall Fathom-related jank by 94 ms (20%) compared to the original implementation of isVisible (see "Initial Profile" in this comment).

const visibleElements = new Set(); // If an element belongs to this set, it's visible

function isVisiblePreProcessing() {
  const observers = new Map(); // element => IntersectionObserver instance
  const observerOptions = {
    root: document.body,
    threshold: 1.0,
  };
  function intersectionCallback(entries) {
    entries.forEach((entry) => {
      const ele = entry.target;
      if (entry.isIntersecting) {
        visibleElements.add(ele);
      }
      const observer = observers.get(ele);
      observer.unobserve(ele);
      observers.delete(ele);
    });
  }

  const root = document.body;
  const walker = document.createTreeWalker(root, NodeFilter.SHOW_ELEMENT, null, false);
  let {currentNode} = walker;
  while (currentNode) {
    const {tagName} = currentNode;
    if (tagName === 'BODY' || tagName === 'SCRIPT') {
      currentNode = walker.nextNode();
      continue;
    }
    const observer = new IntersectionObserver(intersectionCallback, observerOptions);
    observers.set(currentNode, observer);
    observer.observe(currentNode);
    currentNode = walker.nextNode();
  }
}

isVisiblePreProcessing();

// ...

isVisible(fnode) {
    return visibleElements.has(fnode.element);
}

Conclusions

While this removed much of the JavaScript work, adding upwards of 6000 observers for each of the 6000 elements on the sample page seems to cost more Layout work.
It doesn't seem the use case we have here is similar to the intended use case for using Intersection Observer to check for visibility in that:
- We want to check visibility for every single element on the page, not just a handful of say, ads.
- We only want to check once; we are not interested in monitoring the intersection of these elements with another element or the viewport over time.

[1]: Profiled on a locally hosted product page with specs from the "Version information" section in this comment.

Hi @biancadanforth,

I took a read through this and a quick peek at the Price Tracker code. A few questions:

Right now, it looks like the extraction script runs at document_end. Is that strictly necessary? If you use document_idle instead, you can queue the work to occur after the pageload has completed, so as to not impact it.
If you've already tried this, please ignore, but if not, this might be worth considering:

It's possible to decrease the likelihood of causing synchronous layout flushes, and making getBoundingClientRect calls cheaper, by (as you've noted) waiting until a "natural" layout flush. promiseDocumentFlushed is not available to you, but you can kind of emulate it via:

requestAnimationFrame(() => {
  setTimeout(() => {
    // This code will be run ASAP after Style and Layout information have
    // been calculated and the paint has occurred. Unless something else
    // has dirtied the DOM very early, querying for style and layout information
    // here should be cheap.
  }, 0);
});

We're still going to get burned if the page somehow runs any JS that dirties the DOM before the setTimeout runs, but if it hasn't, then calls to getBoundingClientRect inside of that setTimeout should be cheap.

Assuming you can get pretty cheap calls to getBoundingClientRect(), you might be able to use those values plus the scroll offset of the window to calculate if an item is collapsed (0x0 dimensions - this will also cover display: none), invisible (visibility: hidden, or opacity: 0), or off screen (x and y values along with width and height place it outside the display port).

If that's still not cheap enough, then perhaps we could expose some kind of special power to your WebExtension content script to get the information it needs more directly from Layout - though we'd probably want to confer with the Layout team to see how easy that is.

The other thing to consider is that, sometimes things are just unavoidably expensive, and there's not much we can practically do about the overall cost. When that occurs, the next tactic is trying to figure out if you can break up the work over a larger chunk of time. I'm not sure if Fathom is compatible with this idea, but you could use something like IdleCallbacks. Check out this MDN article on using them - but the general idea is that you queue up functions to run during "free idle periods", and you have a time budget you can use, and you yield back when you've reached your budget. You get to continue during the next idle slice.

This can add some interesting bugs where the document state might change underneath you during your scan, but if you're willing to live with those bugs, this might be a fallback plan worth considering.

Thank you very much @mikeconley for taking the time to look at this and offer your thoughts.

Right now, it looks like the extraction script runs at document_end. Is that strictly necessary? If you use document_idle instead, you can queue the work to occur after the pageload has completed, so as to not impact it.

This is a good observation. Certainly if we waited to run Fathom (i.e. isVisible and friends) during idle time, we should see less jank at page load, and we do, though its relative share of the jank is a tenth of that contributed by isVisible.[1]

Originally, the decision to run this at document_end was the fact that we wanted to extract the information from the page as soon as possible, and we found that this was the earliest time that we could get the information successfully in many cases.

While in the case of the Price Tracker application of Fathom, we probably could wait until document_idle to extract, there are some applications like popup blocking, login form identification for the password manager, etc. where waiting for the page to finish loading can make the difference between the feature being useful and not.

Additionally, when this extraction code is run is not handled directly by Fathom but by the application author, as how that is set up depends on the application context. While we can certainly make recommendations, I am especially interested in where we can make performance gains (and to maximal effect) in the Fathom code itself (such as Fathom's isVisible method).

It's possible to decrease the likelihood of causing synchronous layout flushes, and making getBoundingClientRect calls cheaper, by (as you've noted) waiting until a "natural" layout flush. promiseDocumentFlushed is not available to you, but you can kind of emulate it via: [requestAnimationFrame with setTimeout]

It took me a while to get this approach working, as requestAnimationFrame is async, and Fathom is sync. I'm going to go ahead and format this in the same way I did my previous approaches above, so please excuse all the styling.

Updating `isVisible`: fourth attempt - `requestAnimationFrame`

(continued from the first comment in this thread)

What I ended up doing was adding a one-time preprocessing step that is awaited before Fathom is run. This pre-processing step walks the DOM and checks each node of interest using the original isVisible implementation at this opportune time using a setTimeout inside a requestAnimationFrame. It required a decent amount of rewiring in Price Tracker; the diff (branched from PR #317) is below.[2]

Revised profile: fourth attempt using requestAnimationFrame[3]

Numbers

The largest unresponsive chunk is still caused by Fathom extraction, contributing 269 ms of jank right around page load.
isVisiblePreprocessing and isVisible made up 55 ms (20%) of this jank.
This change reduced overall Fathom-related jank by 191 ms (42%) compared to the original implementation of isVisible (see "Initial Profile" in this comment).

Conclusions

This 42% reduction was just from changing when we run the isVisible code. The implementation itself is unchanged!
Based on the profile, it looks like the isVisiblePreProcessing work is spread out across many frames. Indeed, it’s total running time was 335 ms, but most of the time that work was being done, no jank was observed.
One notable difference between this profile and the original is that while page load is much faster (284 ms compared to 871 ms) it takes longer for Fathom extraction to complete successfully (653 ms versus 487 ms). This is an expected tradeoff, as we are spreading out style accesses across animation frames to minimize unnecessary flushes.
This still uses the original, suboptimal implementation of isVisible. Further improvements could be made to the isVisiblePreCheck method to bring down the performance cost even further. For example, we could take the approach of the first attempt, where we perform the cheapest possible accesses and return early where possible (as you mentioned, things like getBoundingClientRect and window.pageXOffset).

If that's still not cheap enough, then perhaps we could expose some kind of special power to your WebExtension content script to get the information it needs more directly from Layout

How do I know what’s cheap enough? What do you mean by special powers? I’m aware of that meaning in mochitests only heh.

The other thing to consider is that, sometimes things are just unavoidably expensive, and there's not much we can practically do about the overall cost. When that occurs, the next tactic is trying to figure out if you can break up the work over a larger chunk of time.

There are a couple of challenges that I see with a concurrency approach. The first is we’d have to rewrite Fathom to be async, as it is sync right now. The second is that, depending on how much work needs to be done, the wall clock time could be the difference between Fathom completing fast enough for the results to be useful in some applications (as I mentioned earlier, e.g. popup blocking). It’s not completely off the table, but it does seem like a heavier lift. I'll check out that link, thanks.

[1]: I profiled the original isVisible implementation running in the content script loaded at document_idle instead of document_end. I still see 430 ms of Fathom-related jank compared to the 460 ms. This means that that one change alone, at least for the sample page I’ve been running on, would reduce Fathom-related jank by 6.5%. By comparison, in the same profile, isVisible contributes 327 ms or 76% of Fathom-related jank. So while that change would certainly help, it’s far from the biggest offender, and as I mention elsewhere, waiting until page load could make the difference in usability for certain applications.

[2]:

diff --git a/src/extraction/fathom/index.js b/src/extraction/fathom/index.js
index be9ec03..acf60d8 100644
--- a/src/extraction/fathom/index.js
+++ b/src/extraction/fathom/index.js
@@ -17,15 +17,6 @@ import coefficients from 'commerce/extraction/fathom/coefficients.json';

 // Minimum score to be considered the "correct" feature element extracted by Fathom
 const SCORE_THRESHOLD = 0.5;
-// For production, we don't need to generate a new ruleset factory
-// and ruleset every time we run Fathom, since the coefficients are static.
-const rulesetFactory = new RulesetFactory();
-const rules = rulesetFactory.makeRuleset([
-  ...coefficients.image,
-  ...coefficients.title,
-  ...coefficients.price,
-],
-biases);

 /** How product information is extracted depends on the feature */
 const FEATURE_DEFAULTS = {
@@ -71,7 +62,7 @@ const PRODUCT_FEATURES = {
  * Extracts the highest scoring element above a score threshold
  * contained in a page's HTML document.
  */
-function runRuleset(doc) {
+function runRuleset(doc, rules) {
   const extractedElements = {};
   const results = rules.against(doc);
   for (const feature of Object.keys(PRODUCT_FEATURES)) {
@@ -95,13 +86,28 @@ function hasAllFeatures(obj) {
 /*
  * Run the ruleset for the product features against the current window document
  */
-export default function extractProduct(doc) {
-  const extractedProduct = {};
-  const extractedElements = runRuleset(doc);
-  if (hasAllFeatures(extractedElements)) {
-    for (const [feature, methods] of Object.entries(PRODUCT_FEATURES)) {
-      extractedProduct[feature] = methods.getValueFromElement(extractedElements[feature]);
+export default async function extractProductFactory() {
+  const rulesetFactory = await RulesetFactory.isVisiblePreProcessing();
+
+  // For production, we don't need to generate a new ruleset factory
+  // and ruleset every time we run Fathom, since the coefficients are static.
+  const rules = rulesetFactory.makeRuleset([
+    ...coefficients.image,
+    ...coefficients.title,
+    ...coefficients.price,
+  ],
+  biases);
+
+  function extractProduct(doc) {
+    const extractedProduct = {};
+    const extractedElements = runRuleset(doc, rules);
+    if (hasAllFeatures(extractedElements)) {
+      for (const [feature, methods] of Object.entries(PRODUCT_FEATURES)) {
+        extractedProduct[feature] = methods.getValueFromElement(extractedElements[feature]);
+      }
     }
+    return hasAllFeatures(extractedProduct) ? extractedProduct : null;
   }
-  return hasAllFeatures(extractedProduct) ? extractedProduct : null;
+
+  return extractProduct;
 }
diff --git a/src/extraction/fathom/ruleset_factory.js b/src/extraction/fathom/ruleset_factory.js
index fd1ca4b..372b793 100644
--- a/src/extraction/fathom/ruleset_factory.js
+++ b/src/extraction/fathom/ruleset_factory.js
@@ -15,6 +16,64 @@ const ONEISH = 0.9;
  * easier testing.
  */
 export default class RulesetFactory {
+  constructor(visibleElements) {
+    this.visibleElements = visibleElements;
+  }
+
+  static isVisiblePreCheck(element) {
+    for (const ancestor of ancestors(element)) {
+      const style = getComputedStyle(ancestor);
+      const isElementHidden = (
+        style.visibility === 'hidden'
+        || style.display === 'none'
+        || style.opacity === '0'
+        || style.width === '0'
+        || style.height === '0'
+      );
+      if (isElementHidden) {
+        return false;
+      }
+    }
+    return true;
+  }
+
+  static async isVisiblePreProcessing() {
+    const visibleElements = new Set();
+    const isVisiblePromises = [];
+    const root = document.body;
+    const walker = document.createTreeWalker(root, NodeFilter.SHOW_ELEMENT, null, false);
+    let {currentNode} = walker;
+    while (currentNode) {
+      const {tagName} = currentNode;
+      const desiredTags = ['DIV', 'IMG', 'H1', 'SPAN', 'H2']; // based on our dom() LHS for each feature
+      if (!desiredTags.includes(tagName)) {
+        currentNode = walker.nextNode();
+        continue;
+      }
+      // Need block level scoping inside the while loop to capture the current iteration's
+      // currentNode value inside setTimeout
+      const node = currentNode;
+      // eslint-disable-next-line no-loop-func
+      const promise = new Promise(resolve => requestAnimationFrame(() => {
+        setTimeout(() => {
+          // This code will be run ASAP after Style and Layout information have
+          // been calculated and the paint has occurred. Unless something else
+          // has dirtied the DOM very early, querying for style and layout information
+          // here should be cheap.
+          const isVisible = this.isVisiblePreCheck(node);
+          if (isVisible) {
+            visibleElements.add(node);
+          }
+          resolve();
+        }, 0, resolve, node);
+      }));
+      currentNode = walker.nextNode();
+      isVisiblePromises.push(promise);
+    }
+    await Promise.all(isVisiblePromises);
+    return new RulesetFactory(visibleElements);
+  }
+
   getHighestScoringImage(fnode) {
     return fnode._ruleset.get('image')[0]; // eslint-disable-line no-underscore-dangle
   }
@@ -199,24 +258,7 @@ export default class RulesetFactory {
   }

   isVisible(fnode) {
-    for (const ancestor of ancestors(element)) {
-      const style = getComputedStyle(ancestor);
-      const isElementHidden = (
-        style.visibility === 'hidden'
-        || style.display === 'none'
-        || style.opacity === '0'
-        || style.width === '0'
-        || style.height === '0'
-      );
-      if (isElementHidden) {
-        return false;
-      }
-    }
-    return true;
+    return this.visibleElements.has(fnode.element);
   }

   hasBackgroundImage(fnode) {
diff --git a/src/extraction/index.js b/src/extraction/index.js
index db15199..10f3037 100644
--- a/src/extraction/index.js
+++ b/src/extraction/index.js
@@ -17,17 +17,6 @@ import extractProductWithOpenGraph from 'commerce/extraction/open_graph';
 import {shouldExtract} from 'commerce/privacy';
 import recordEvent from 'commerce/telemetry/content';

-/**
- * Extraction methods are given the document object for the page, and must
- * return either a valid ExtractedProduct, or null if a valid product could not
- * be found.
- */
-const EXTRACTION_METHODS = {
-  fathom: extractProductWithFathom,
-  css_selectors: extractProductWithCSSSelectors,
-  open_graph: extractProductWithOpenGraph,
-};
-
 let isBackgroundUpdate = false;
 try {
   isBackgroundUpdate = window.top.location.href.startsWith(
@@ -74,10 +63,10 @@ class ExtractionAttempt {
  * order until one of them returns a truthy result.
  * @return {ExtractedProduct|null}
  */
-function extractProduct() {
+function extractProduct(methods) {
   const attempt = new ExtractionAttempt();
   attempt.start();
-  for (const [methodName, extract] of Object.entries(EXTRACTION_METHODS)) {
+  for (const [methodName, extract] of Object.entries(methods)) {
     const extractedProduct = extract(window.document);
     if (extractedProduct) {
       attempt.succeed(methodName);
@@ -107,8 +96,8 @@ async function sendProductToBackground(extractedProduct) {
  * Checks to see if any product information for the page was found,
  * and if so, sends it to the background script.
  */
-async function attemptExtraction() {
-  const extractedProduct = extractProduct();
+async function attemptExtraction(methods) {
+  const extractedProduct = extractProduct(methods);
   if (extractedProduct) {
     await sendProductToBackground(extractedProduct);
   }
@@ -140,16 +129,25 @@ async function attemptExtraction() {
     return;
   }

+  // Extraction methods are given the document object for the page, and must
+  // return either a valid ExtractedProduct, or null if a valid product could not
+  // be found.
+  const methods = {
+    fathom: await extractProductWithFathom(),
+    css_selectors: extractProductWithCSSSelectors,
+    open_graph: extractProductWithOpenGraph,
+  };
+
   // Extract immediately, and again if the readyState changes.
-  let extractedProduct = await attemptExtraction();
+  let extractedProduct = await attemptExtraction(methods);
   document.addEventListener('readystatechange', async () => {
-    extractedProduct = await attemptExtraction();
+    extractedProduct = await attemptExtraction(methods);
   });

   browser.runtime.onMessage.addListener(async (message) => {
     if (message.type === 'reextract-product') {
       // Re-extract the product if requested
-      extractedProduct = await attemptExtraction();
+      extractedProduct = await attemptExtraction(methods);
       await sendProductToBackground(extractedProduct);
     } else if (message.type === 'resend-product') {
       // Re-send the already-extracted product if requested

[3]: Profiled on a locally hosted product page with specs from the "Version information" section in this comment.

mozilla / price-tracker