jashkenas / underscore

JavaScript's utility _ belt
https://underscorejs.org
MIT License
27.3k stars 5.53k forks source link

Functional style #2908

Open jgonggrijp opened 3 years ago

jgonggrijp commented 3 years ago

This is a substantial refactoring of the Underscore source code, which I've been working on since shortly after the modularization (#2849). Before merging, I would like this to be reviewed extensively, by multiple people, in multiple ways and in multiple stages. Please see the final third of this post for review details.

Edit: I need help from people who can contribute benchmarks based on real-world applications. Please leave a comment if you think you might be able to help!

Goals

Principles

Approach

I started by asking myself which existing Underscore function should be the single source of truth about collection iteration. Initially, I considered _.reduce. It is a very general function and many other collection functions can be cleanly expressed in terms of it, for example _.map, _.filter and _.min/_.max. I also considered _.each, because it doesn't use an accumulator by default. _.each and _.reduce are equally powerful, in the sense that either can be cleanly expressed in terms of the other.

I soon realized, however, that there is a function that can't be cleanly expressed in terms of _.each or _.reduce: _.find. Like the procedural for-in loop with a break option, _.find may stop iteration early. Otherwise, it basically does the same thing as _.each. All Underscore collection functions can be cleanly expressed using _.find, including not only _.each, _.reduce, _.map and _.min but also _.some, _.every and _.contains. For this reason, I have chosen _.find to be the one function that defines how to iterate a collection.

Conveniently, _.find was already implemented by branching over the collection types: it calls _.findIndex on arrays and _.findKey on objects. These latter functions, in turn, are the single sources of truth about iterating arrays and objects, respectively (although the situation is a bit more nuanced with regard to arrays; more on this shortly). In Underscore 2.0, I plan to add _.findEntry for Map/Set and _.findIteration for iterators. By including these in _.find and implementing all other collection functions in terms of _.find, all collection functions would automatically support all five collection types in a consistent way.

Besides using _.find in all collection functions, one of the first things I did was factoring out very general linearSearch and binarySearch functions. These are the actual single sources of truth on how to iterate/search arrays; _.findIndex, _.findLastIndex, _.indexOf, _.lastIndexOf and _.sortedIndex are all implemented using linearSearch and/or binarySearch under the hood.

linearSearch is so general that I was able to substitute it for nearly all hand-written for/while loops in the source code. This proved very effective in both producing cleaner code and reducing the bundle size. However, I have reverted this in many places because function call overhead turned out to be costlier than I expected. It seems that even modern JS engines rarely perform inlining, if ever. After discovering this, I adopted the following "safe" rule to choose between linearSearch and a hand-written loop: if the loop body didn't previously involve any function call, keep the hand-written loop; otherwise, replace it by linearSearch or another Underscore function. In this way, the extra function call introduced by using an Underscore function can never slow down the loop by more than a factor two. I expect the slowdown to be less in real-world scenarios, because a loop body that purely consists of two function calls (i.e., with functions that don't do anything) is trivial. Also, loops that already involved a function call often already involved multiple function calls.

I wrote an extensive microbenchmark suite to measure performance, in which each operation is repeated at least 3.6 million times and during at least a whole second. Please follow that link for details. All performance claims in the current post are informed by repeated measurements in Node.js 10, mobile Safari 14 and desktop Firefox 84 (on macOS) with those microbenchmarks. However, I believe that a final performance impact assessment should be made by comparing execution time in real-world applications and across a wider range of engines. More on this in the review section at the bottom of this post.

With this overall approach in mind, I made several passes over all the source modules, looking for opportunities to make the code cleaner, DRYer and more functional. The code I'm presenting today is the result of this effort and the subject of the first round of reviews. Once I have received and processed the first round of reviews, I plan to make one more pass over the source code, in order to make function parameter names more consistent and to fix the linear order of the bundles. After that, I'll take one smaller, final review before finally merging the branch.

Results (stage 1)

Review

This PR is unusual because it may lead to changes that sacrifice performance in favor of other considerations. For the sake of accountability, I'd like the review to be very thorough and entirely public. This is going to be a lot of work and I don't want all that work to land on the shoulders of only one or two people. Instead, I hope to involve many people, each doing only light work. As a token of appreciation, everyone making a substantial contribution to this review will get a permanent honorable mention in the changelog.

I propose to divide the review work as follows. I'll be inviting specific people for specific parts, but please rest assured that anyone is welcome to participate in any part of the review. I'll keep track of review progress below.

Benchmark preparation

If you have a real-world JavaScript application (or library function) that meets all of the following criteria:

and you are willing to do the following things, with help as needed:

then please let us know by leaving a comment below!

Performance measurements

You can contribute performance measurements by doing the following.

  1. Pick an engine that you can run benchmarks on (such as Firefox) and a benchmark program that can run on this engine.
  2. Run the benchmark on the chosen engine, both with Underscore at commit c9b4b63 (master) and Underscore at commit eaba5b5 (functional), and compare the speed.
  3. If you find a small speed difference, please repeat your measurements as often as is required in order to determine whether the difference is significant.
  4. If you do find a significant difference, please profile the benchmark with both versions of Underscore to determine what change in Underscore is causing the difference.
  5. Report your findings by leaving a comment below. Please include details such as hardware specifications, exact versions of the operating system and engine used, number of repeated measurements, exact measured speed on each repetition, and exact numbers obtained from profiling.

Final words

I make major contributions to Underscore, like the current one, because I believe it is an excellent library and because I want to help lift it to the next level. If you like what I'm doing, please consider supporting my open source work on Patreon. Shoutout to my existing backers, who encouraged me to do this work, and to @jashkenas, who created this awesome library.

coveralls commented 3 years ago

Coverage Status

Coverage increased (+0.04%) to 95.238% when pulling eaba5b58fa8fd788a5be1cf3b66e81f8293f70f9 on jgonggrijp:functional into 745e9b7314064e66a7257f9b361030e6055980b8 on jashkenas:master.

jgonggrijp commented 3 years ago

High-level review by @jashkenas, quoted from private email with his permission:

I gave your PR/plan a quick skim, and am 100% in agreement with your plan and proposal. I really love what you're trying to do here! I agree that large performance considerations need to trump pure elegance, for the sake of pragmatism.

jgonggrijp commented 3 years ago

First benchmark is in, thanks to awesome work by @m90!

jgonggrijp commented 1 year ago

Update: I will be writing an additional real-world benchmark in week 46, which is November 13–17. I hope to finish and merge this branch soon afterwards.