mozilla / fathom

A framework for extracting meaning from web pages
http://mozilla.github.io/fathom/
Mozilla Public License 2.0
1.97k stars 75 forks source link

Make fast enough for untriggered use #91

Open erikrose opened 5 years ago

erikrose commented 5 years ago

Because it needs access to the DOM, Fathom currently wants to run on the main thread. Unless run in response to a user action, it can create a little jank, taking upwards of 40ms to run a ruleset, so we dare not run it, for example, on every page load. Can we speed it up or find a way to run it offthread?

One approach is to make Fathom run faster. About 80% of its runtime on the Pricewise ruleset is spent in DOM routines. Those do a lot of flushing of layout and other pipeline stages, redoing calculations unnecessarily. Is this a major source of wasted time? Measure. Are there lower-level hooks we can use? (window.windowUtils.getBoundsWithoutFlushing() might be a faster way of getting element size, for example. mattn suggested it. Also see https://developer.mozilla.org/en-US/docs/Mozilla/Firefox/Performance_best_practices_for_Firefox_fe_engineers, which has, for instance, routines to get the window size and scroll without flushing things.) Other ideas?

Can we run Fathom offthread without losing access to too much signal? Reader Mode currently serializes the markup (only) and ships it offthread to parse. Could we do something like that but also apply CSS ourselves offthread? Would that preserve enough signal for most rulesets? Would it be too slow or battery-hungry on 2-core mobile devices?

This bug is done when we can blithely run a Fathom ruleset on every Firefox page load without concern for dragging down the UX.

biancadanforth commented 5 years ago

I've been talking with a number of Firefox/performance engineers (Gijs, Emilio, Rob, Greg) about this and have some useful information to share.

Profiling Fathom 3.0 in Price Tracker

Most of the conversations centered around improving Fathom as-is (blocking the main thread) in Price Tracker.

TL;DR: isVisible, which will likely be a common rule in many Fathom applications, accounts for the majority of the 460 ms of Fathom-related jank (67%). It may be possible to reduce this overall Fathom jank by as much as 374 ms (81%) by reducing style and DOM property accesses in isVisible, but this solution requires a privileged context.

This case study helped me to develop some general performance strategies shared below.

General strategies for writing a performant ruleset (executed in the main thread)

Ways to improve the Fathom library itself

On parallelism: running Fathom off the main thread

Open questions

emilio commented 5 years ago

I'd note that it should be possible also to push in the CSS Working Group to get something like elementFromPoint{s} to take a set of options to ignore the viewport clip or something.

That should make it work everywhere. Apparently there are a few old requests for that: https://lists.w3.org/Archives/Public/www-style/2012Oct/0683.html

cssom-view is unmaintained atm, but I'd be happy to help out there.

emilio commented 5 years ago

I filed https://github.com/w3c/csswg-drafts/issues/4122 to try to standardize something that would've helped here.

biancadanforth commented 5 years ago

Is it possible to tell what style accesses trigger a layout flush?

Per Emilio:

In practice, a good rule of thumb is something like "if there's something that depends on up-to date layout information, that flushes layout". Same for style and paint.

It is also pretty easy to test. Create a big page (or open one, like https://html.spec.whatwg.org/), and write a loop like:

var start = performance.now();
for (var i = 0; i < 1000 /* insert/remove zeros as needed */; i++) {
document.documentElement.style.display = i % 2 == 0 ? "none" : "";
theApiYouWantToTest();
}
console.log(performance.now() - start);

If theApiYouWantToTest() flushes layout, you'll see it takes massively longer than if it doesn't. Compare putting something simple like document.documentElement.style.color (which doesn't need to update the style of the page) with document.documentElement.getBoundingClientRect() (which updates layout).

Note that the document.documentElement.style.display = i % 2 == 0 ? "none" : ""; is to ensure that a layout style changes in each iteration. This makes the flush as expensive as possible and the time difference between something that does and doesn't trigger a flush very apparent.

In reality, it's possible to use something like getBoundingClientRect that causes a flush without changing any layout styles (like the display value), so the cost of the flush is much reduced with an earlier return (as might be the case in isVisible, which may be why the jank was dominated by XRays rather than layout work (see the last section)).

Why we probably need to make Fathom async

As noted here, the original, sync implementation of isVisible in Price Tracker caused the majority of Fathom-related jank on a sample page. What I discovered yesterday is that, if I only change when isVisible is executed (i.e. run it asynchronously immediately after a paint, so as not to trigger unnecessary layout flushes), there was a 42% reduction in Fathom-related jank!

This improvement would be on top of any performance improvements to isVisible itself.

One less-than-ideal option is to add a new, async pre-processing step to Fathom that runs before the ruleset is executed. This step would only run if Fathom's isVisible function is being used in the ruleset.

The best option, however, is for Fathom itself to be made async. Something like:

const results = await rules.against(document);

...and inside the ruleset where it uses isVisible, its execution would pause until it could be run right after a paint.

Making Fathom async will enable further concurrency (see item 3) as well.

@erikrose , Should we break out "Make Fathom async" into a separate issue for discussion?

erikrose commented 5 years ago

@erikrose , Should we break out "Make Fathom async" into a separate issue for discussion?

Yes, please. Is seems to me it should be possible to take a middle-of-the-road approach as well: call the existing synchronous Fathom routines in a requestAnimationFrame() callback, thus calling geometry-using routines like isVisible() at the optimal time without requiring a rewrite of the Fathom execution machinery. Correct?

On the same subject, I do notice that requestAnimationFrame() itself probably ceases to call its callbacks on background or otherwise invisible tabs. Whether this is a problem depends on the application, but it's something to keep in mind.

biancadanforth commented 5 years ago

Fathom in Firefox: Initial performance discussion with the Performance team on the Smoot project

Background:

I filed a Performance Review Request[1] outlining the high level details for the Fathom/Smoot project, which is expected to be the first Firefox application of Fathom, and Erik and I met with dothayer from the performance team last week.

Key takeaways:
Next Steps

In following the Recommended Plan[2], here are the next steps:

  1. Build an actor to be run on promiseDocumentFlushed (or requestAnimationFrame with a setTimeout workaround for non-chrome code) for every http(s):// page which runs a dummy, minimal Fathom ruleset against the DOM.
    • This will help us determine of the overhead of Fathom itself is too expensive[3].
    • A minimal ruleset might be one that doesn't do any DOM, style or layout access. Perhaps it just returns a random result or pulls in paragraphs and measures their length.
  2. Create an extension or some mechanism to easily monitor how long we spent in our actor for a given page.
    • The goal behind this is that we want to quicky surface any pages where performance is bad.
    • The best metric to measure performance would be something like a Speed Index[4].

Appendix

References

The Performance discussion was based largely on these restricted access documents on Mozilla's Google Drive:

Annotations
erikrose commented 5 years ago

A few other bits and pieces:

biancadanforth commented 4 years ago

Here is a record of my latest notes on next steps for performance, since I was moved off the Fathom team.

References:

How do we know when Fathom is “fast enough”?

What to do next:

Profile ReaderMode/isProbablyReaderable in Firefox as a baseline Context (Reference: Abbreviated Fathom / Smoot Perf Recommendations) Per Gijs, much of Reader Mode’s work is done off the main thread, except for isProbablyReadable. Mconley suggested we use this work as a baseline. Note: Reader Mode doesn’t even run on home pages, we we need to choose a URL with a path.

Plan

  1. Choose a consistent page to load across all such profile tests, Fathom or not and freeze it using FathomFox.
  2. Set up a local HTTP server (fathom-serve) to host a sample page locally. This removes network and other non-reproducible effects.
  3. Use erikrose/gecko-dev’s master branch to profile Reader Mode’s isProbablyReaderable method.
erikrose commented 3 years ago

80% of Fathom's time is spent calling DOM routines. Zibi was telling me that Fluent had the same problem and solved it by turning to DOM bindings, lowering its DOM accesses to direct C++ calls rather than going through the JS layer, which requires the runtime generation of reflection objects (different than X-Rays, which are for insulating content scripts from the page's monkeypatching). We could have Fathom compile rulesets to Rust. Or we could at least compile the parts which do DOM access, run them all at once up front, and ship their results back to JS. Zibi says the communication is fairly expensive. Lots of design space to explore here, obviously.

erikrose commented 3 years ago

Victor got 10x speed improvement by stubbing out getComputedStyle C++-side. He had suspicions that the time was largely going into flushes, but we lack evidence of it. Flushes don't show up in the flame graph. There is a fair amount of XRay overhead. 6% goes to mozilla::ServoStyleSet::ResolveStyleLazily, which could be cured by Emilio's display:none patch. Another 6% goes to xpc::XrayWrapper::getOwnPropertyDescriptor.

erikrose commented 3 years ago

dthayer ported the entirety of isVisible() to C++ and got a 10x speedup based on running it over every node of an Amazon page. This is on top of his using versions of routines that avoid flushes. (The lack of node pruning in this experiment might offset the fact that Pricewise rulesets were only 67% isVisible() and new-password ones only 17%.)

erikrose commented 3 years ago

See also this performance work: https://bugzilla.mozilla.org/show_bug.cgi?id=1709171.