jashkenas / underscore

JavaScript's utility _ belt
https://underscorejs.org
MIT License
27.33k stars 5.53k forks source link

ES modules exports (implements #2718) #2826

Closed jgonggrijp closed 4 years ago

jgonggrijp commented 4 years ago

Ready for review but not ready for merging; see "before merging" notes below.

Purpose

Reformatting the underscore source code to use ES6 exports, in order to facilitate treeshaking with tools such as rollup. See #2718.

Non-purposes

I agree with and followed https://github.com/jashkenas/underscore/issues/2718#issuecomment-575725274 and https://github.com/jashkenas/underscore/issues/2718#issuecomment-575728627. One change at a time.

This is also one of the reasons this isn't ready for merge yet; the move to ES6 exports obviates one level of indentation, but I kept that indentation for now in order to ease review.

Update: a commit that removes the obsolete indentation was added to this PR on 2020-03-13.

What I did

I did not convert the tests to ES module syntax; in fact, they still import from the (now generated) underscore.js.

Update 2020-03-26: In the meanwhile, in addition to the above, I have done a few more things which are detailed in the comments below.

Interface changes

I believe that the published UMD interface is otherwise completely identical to what is currently on the master branch.

This PR shows a comparison between the former hand-written underscore.js and the new reformatted underscore-source.js. In order to appreciate the interface changes, one might want to see a comparison between the former hand-written underscore.js and the new rollup-generated underscore-js instead. I pushed a secondary branch to make this possible. The diff can be found over here: https://github.com/jashkenas/underscore/compare/master...jgonggrijp:es-module-compiled?expand=1#diff-0f36b362a0b81d6f4d4bfd8a7413c75d (you may have to click "load diff").

Quantitative changes

The amount of code to maintain decreases very slightly.

. hand-written underscore.js proposed _source + _module + _umd
total lines of code 1692 1684
gzipped kB 16.32 16.04

The weight of the published gzipped + minified UMD bundle increases by 240 bytes.

. hand-written UMD module rollup generated UMD module
minified + gzipped kB 6.5 6.74

In exchange, users get the option to treeshake when using tools like rollup. If a user imports only map (from underscore-source.js), underscore's contribution to her UMD bundle decreases by 900 lines of code compared to the present situation. This is substantial, if not as spectacular as one might expect. This is due to rollup not being clever enought about side effects and erring on the side of caution.

As an experiment, I also created a completely modularized version where each function lives in a separate module (not published). This enabled rollup to shave off another 460 lines of code. Rollup also surprised me by pointing out circular dependencies, which it apparently cannot do if all code is in one module. I suspect that solving these circularities might cut map's size in half again, but I have not tried this.

. rollup treeshaken size of map in lines of code
present situation 1692
proposed changes 786
modularized (experimental, unpublished) 329

While the minified + gzipped UMD bundle weight has increased a bit, these numbers make me believe that using ES exports is definitely a move forward. In addition, solving circular dependencies might also decrease the weight again.

Update 2020-03-26: updated treeshake performance numbers can be found at the bottom of https://github.com/jashkenas/underscore/pull/2826#issuecomment-599089184.

Before merging

Edit 2020-03-7: Changed suggested names from modules/{underscore,index,umd}.js to modules/{index-unmixed,index,index-umd-wrapper}.js because the former might be problematic if we modularize the whole library in a subsequent PR.

Update 2020-03-27: slightly changed the above suggestion to modules/{index,index-all,index-default}.js, respectively. This suggestion was also implemented in a new commit that has been added to this PR.

Edit to add on 2020-06-18: You can support my work on Underscore and other open source projects on Patreon.

jgonggrijp commented 4 years ago

@jashkenas I'm done writing. I would be very grateful if you could find the time to review this. My apologies for the long opening post and the big diff (despite my best attempts to keep the diff minimal)!

@lohfu @mjeanroy FYI.

jashkenas commented 4 years ago

Wow! 👏👏👏.

Thanks for taking the time to actually make a stab at this — many people have talked a lot about it over the years, but it takes a rare soul to follow through.

I'm actually starting a month of parental leave next week, so should have a good chunk of time to look at this more closely soon. Please pester me if I don't get back to you soon enough...

jgonggrijp commented 4 years ago

@jashkenas It's a pleasure. I love this library, thank you for making it. Good luck with parenting!

jgonggrijp commented 4 years ago

@jashkenas Any updates?

jashkenas commented 4 years ago

Not yet, and we have a wedding to attend this weekend, but thanks for the ping!

On Fri, Mar 6, 2020, 2:45 PM Julian Gonggrijp notifications@github.com wrote:

@jashkenas https://github.com/jashkenas Any updates?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jashkenas/underscore/pull/2826?email_source=notifications&email_token=AAABE7HAYZNI36TQINZRIATRGEZBVA5CNFSM4K5GBDFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOCHDCA#issuecomment-595882376, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABE7AN3M7EA5PYO3M7FZLRGEZBVANCNFSM4K5GBDFA .

jgonggrijp commented 4 years ago

@jashkenas Here's another ping. My apologies if this is too soon. I hope you had a nice weekend and wedding.

I'd like to draw your attention to #2829 as well. If you approve of that one, I'd prefer it to be merged before this one. #2829 is a very small change.

On another note, would you appreciate some help with triaging open issues and PRs? Rather than spamming your notifications by responding to each of them individually, I could create a single issue ticket with a list of PRs than can be closed, a list of issues that can be closed, a list of PRs that require review, etcetera. If you like.

jashkenas commented 4 years ago

This is looking truly great. Thank you for taking the time to create such a solid draft of an ES-modularization. I really just have a few minor notes around the edges:

Bravo! 👏

jgonggrijp commented 4 years ago

Thank you for taking the time to review this. I'm happy to hear it looks so good to you. This weekend, I'm going to try to address your concerns as well as I can. Some comments in advance:

  • It would be nice if the UMD build of Underscore continued to support ES3 — there are a lot of folks who use Underscore, for example, in Adobe Illustrator and Photoshop under ExtendScript. It would be even better if we could find a way to assert that compatibility in the tests.

This is a tall order, but I agree with the sentiment in principle and I will investigate the options.

If we change this in the future, and start using const, async and friends,

Please observe that this is by no means my intention, not even for a subsequent pull request. The current PR introduces no new syntax other than import/export. In my opinion, it is not necessary at this time to introduce other ES6+ syntax. In fact, not doing that has the advantage that no transpiler is needed.

it should be part of a bigger push to provide more asynchronous/promise-based functional logic in Underscore.

I'm not sure about this. I'm not sure more async belongs in underscore at all, because there is already a perfectly fine library available for that purpose. I'm not sure such an introduction would require modern async/await syntax, either, since you can work with promises without that syntax, too. And finally, the syntax used in the source code bears little relevance to the functionality available to the end user, since she will always see a version of the library that is encoded in old syntax, either as written by hand or transpiled down from modern syntax.

  • I'd prefer getting rid of the additional underscore-module.js file, and just including the _.name = function assignment in the main source file. It certainly is a neat trick though.

Why do you want to keep everything in one file?

The purpose of the module/source divide is not just avoiding repetition, although I think there is great value in DRY. It is also the only way to enable any treeshaking at all. I realize I didn't emphasize this before, but treeshaking only works when the user imports from the underscore-source, not when importing from the underscore-module. Treeshaking works when importing from the underscore-source, exactly because it doesn't contain the _.name = function assignments; those assignments constitute side effects and they force tools like rollup to always include all of the source.

I'll be frank: there is no point in accepting this pull request unless you resign to the idea of splitting the source into at least two modules.

However, I'm very certain that you have valid, well-thought reasons for wanting to keep everything together. I want to fully understand these reasons and I want to address them to the best of my ability. Do they perhaps include the wonderful annotated source edition that you can read through in one go? Name all of it, and I will fight dragons so we can keep as many of the advantages of a single module as possible.

  • Naming: Since this change will become Underscore 2.0,

Why will it? Isn't this mostly new functionality rather than a radical interface change?

I think it's fine for the main "underscore.js" to switch from being UMD formatted to a clean ES module. My preferred naming scheme would be:

    • underscore.js The main source ES module, which is also the module that should be directly imported when using ES modules. (No separate -module version.)

As I wrote above, this wish is impossible to grant while also having treeshaking. In addition to that, if you continue to call the UMD bundle underscore.js and use some other name for the main ESM entry point, nobody will need to change the path from which they import underscore, not even people who use ESM imports, because the module field of the package.json can transparently point them to the correct underlying file.

    • underscore-umd.js If it would be possible to eliminate this file by having Rollup read from underscore.js directly (now that it includes the correct default export), that would be great. Otherwise, we can leave this file in place.

Unfortunately, no, that file cannot be eliminated. The problem is not in the way the default export is arrived at (I could have done mixin(allExports) ; export { default } from 'underscore-source' instead), but in the way rollup treats files that contain both named exports and a default export. It insists on setting module.exports to a plain object in that case, which means that if people do

var _ = require('underscore');

, their _ is not anymore the OOP chain wrapper/partial placeholder/iteratee override holder it should be. For that, they would need to use

var _ = require('underscore').default;

instead. The only way to make the first scenario work, which I really think it should, is to offer rollup a module that has a default export and no other exports.

  • It would be nice if the internal _names followed a consistent pattern (regardless of frequency of use), so: name for exports and _name for internal functions only.

I do agree with this and I will do as you suggest.

  • It looks like it may be possible to have more current versions of Rollup avoid the use of getters by using the --no-externalLiveBindings option, which will make things IE8 compatible.

No, actually I tried that first, with rollup version 1.31.1. It still insisted on creating getters for some of the exports, probably because they were aliases of other exports. The only way to get rid of the getters entirely was the legacy option. This also indicates a potential problem with rollup; they don't seem very committed to supporting legacy systems.

I think there are three main options:

  1. Accept that we need to lag behind the rollup releases. It is only a (minor) problem for the developer, not for the user. The rollup version I used is only two or three years outdated, and apart from the getters and the use of Object.freeze (both of which are only lacking because of the legacy option), the generated code is identical to what recent versions of rollup generate.
  2. Drop support for IE8 (and probably ExtendScript). It hurts, but if you really intend to make this a 2.0, maybe it is an acceptable sacrifice. It would definitely help to shake some weight.
  3. Do a final transpilation+polyfill pass on the UMD bundle. This allows you to be as backwards-compatible as you want, but it comes at an ugly cost: additional layers of complexity in the tooling and potentially a lot of extra weight.
  • Overall — and even if it looks like it makes life easier for Rollup — I'd still strongly prefer to keep everything in a single underscore.js, ES module file. Perhaps there’s something we can do with respect to the code organization, or avoiding function hoisting, or circular dependencies, that can solve the treeshaken LOC problem in a different way.

Let me begin by repeating that I believe you have good reasons for this and that I want to address those reasons as well as I can. Please do explain those reasons to me in detail.

Rollup (and the hard problem of JavaScript treeshaking in general) is mainly sensitive to two things: side effects and dependency. Apart from that, the way the code is organized does not really affect its resuts. Whether one function goes before the other or the other way round does not matter, as long as neither depends on the other or writes to an object that the other might read from. Especially the question of side effects quickly becomes intractable as more common variables are involved. This is why treeshaking becomes more effective as the size of modules decreases: modules have isolated namespaces, so the finer the modularization, the less potential common state.

I truly believe that further modularization, up to the point of putting every function in a separate module, would be good for underscore. I didn't do it in this PR because I believe in evolutionary change and because I wanted to keep the review as digestible as possible. But I believe it should be the next step.

Apart from dramatically improving treeshaking, splitting out the source code into many small modules has a few other clear advantages:

By extension of the above, I believe that fine modularization goes before any attempt at solving circular dependencies.

Bravo! 👏

Thank you!

jashkenas commented 4 years ago

Thanks for the detailed response!

Please observe that this is by no means my intention, not even for a subsequent pull request. The current PR introduces no new syntax other than import/export. In my opinion, it is not necessary at this time to introduce other ES6+ syntax. In fact, not doing that has the advantage that no transpiler is needed.

👍

I'm not sure about this. I'm not sure more async belongs in underscore at all, because there is already a perfectly fine library available for that purpose.

👍

I think there are three main options:

  1. Accept that we need to lag behind the rollup releases.

👍 That sounds fine. I'm sorry to hear that the --no-externalLiveBindings doesn't work as described.

I'll be frank: there is no point in accepting this pull request unless you resign to the idea of splitting the source into at least two modules.

That’s fine, if it's necessary. If we end up with one file as the straight ES Module exports, one as the complete mixed API, and one file as the single-object UMD export — as you have it today — that's cool with me.

However, I'm very certain that you have valid, well-thought reasons for wanting to keep everything together. I want to fully understand these reasons and I want to address them to the best of my ability.

Simply because it's the soul of this (now venerable) project. Underscore has always tried — and has succeeded and failed over the years, as waves of pull requests have come and gone — to be a simply-written, single-file tour through some useful functional programming idioms in JavaScript. It’s an exercise that’s supposed to be as pedagogical as it is practical. There’s a balance between performance and organizational complexity on the one hand, and readability and source code simplicity on the other. Being a short, single-file library with no dependencies is at the heart of that identity. Also, Lodash already exists.

But I don't think that should be a dealbreaker — since, if we separate the -module or -source file as you suggest, it should make no difference to Rollup, or any other tool, if the source code happens to live in one file or many. It should be possible to syntactically determine where the seams lie between the functions, and achieve equal (or close to equal) results. And if they're not, it's either a hidden circular dependency (that we can fix), or a bug in Rollup (that can be fixed). Right?

If there are any other open questions you feel like I haven't addressed, please let me know.

jgonggrijp commented 4 years ago

If we end up with one file as the straight ES Module exports, one as the complete mixed API, and one file as the single-object UMD export — as you have it today — that's cool with me.

Great!

... reasons for wanting to keep everything together.

Simply because it's the soul of this (now venerable) project. Underscore has always tried — and has succeeded and failed over the years, as waves of pull requests have come and gone — to be a simply-written, single-file tour through some useful functional programming idioms in JavaScript. It’s an exercise that’s supposed to be as pedagogical as it is practical. There’s a balance between performance and organizational complexity on the one hand, and readability and source code simplicity on the other.

I know what you mean. Underscore is a very elegant library, venerable indeed. While I had read the docco rendering before, I became more intimately familiar with the code while preparing this PR and it only strengthened my opinion. It's like a good, well-paced story in which everything comes full circle at the end. With enlightening ingenuity.

Being a short, single-file library with no dependencies is at the heart of that identity. Also, Lodash already exists.

I full-heartedly agree about the succinctness and the self-sufficiency. As far as I'm concerned, those can and should stay. I also see the value of the "single read", but I think there might be other ways to address that than necessarily keeping all the primary source code in one file. Maybe it is enough if we generate a monolithic, well ordered ESM using rollup and then feed that into docco. To me, it seems like that might actually be the best of both worlds. Underscore could at the same time present a holistic story and also be a prime example of how to write elegant modular code.

As for Lodash. I'm a bit hurt by the remark. I don't believe that modularity is some kind of gliding scale that separates Lodash from Underscore, or even that it is the most significant difference between the libraries. As far as I'm concerned, Lodash is what it is for very different reasons. And I'm definitely not going to repeat that history. To begin with, I will not fork Underscore under any condition.

But I don't think that should be a dealbreaker — since, if we separate the -module or -source file as you suggest, it should make no difference to Rollup, or any other tool, if the source code happens to live in one file or many. It should be possible to syntactically determine where the seams lie between the functions, and achieve equal (or close to equal) results. And if they're not, it's either a hidden circular dependency (that we can fix), or a bug in Rollup (that can be fixed). Right?

Sorry, but no, wrong. Observe that there is a factor two treeshaking difference between the underscore-source from the current PR and the experimental, fully modularized version that I didn't publish, despite both containing the exact same code with the same circular dependencies and despite me using the exact same version of rollup in both cases.

As a matter of fact, it makes a great deal of difference whether the source code lives in one file or many. I can see why you don't want this to be true, ideologically speaking, but it is true nonetheless. Rollup is over its neck in the murky business of trying to statically track side effects in an impure, weakly typed dynamic language, while Underscore is a little paradise of functional, orthogonal style. Getting a factor two treeshaking improvement by changing Underscore is many orders of magnitude easier than getting that same improvement by changing Rollup. I know; I got a factor four treeshaking improvement in less than a week of editing Underscore (a factor two by splitting it in two and then another factor two by splitting it in 146), working all alone, while the Rollup team has been working on Rollup for years and they never managed such a dramatic treeshaking improvement on anything whatsoever.

I'm not just talking pragmatics. There are strong, fundamental reasons for this. In 2050, ES code distributed over many files will still be more effectively treeshaken than monolithic source code. I'll bet on that.

Besides that, hidden circular dependencies are uncovered by splitting everything in tiny modules and then having a tool tell you where the circularities are. That, or many hours of brain-melting static analysis by a human (assuming the human won't give up). I don't know about you, but I'm not going to repeat that effort every time we suspect there might be a circular dependency (which we might not always suspect). Given the choice, I'd rather have it modularized permanently and get an instant hint from a tool whenever somebody introduces a circularity.

If there are any other open questions you feel like I haven't addressed, please let me know.

Yes. I don't fully understand yet why you feel that modularity is at odds with the elegance of Underscore. So far I see no conflict, only harmony. What makes you feel different?

jashkenas commented 4 years ago

As I said, it’s an aesthetic preference that — to me — lies quite close to the heart of Underscore’s raison d'être, and is a matter of prioritization. To me, keeping Underscore a single file that can be copied, read through in a sitting, and used without any additional tooling, is very important, and a large part of the library’s original usefulness and popularity. Whether or not we support tree-shaking at all is much less important to me. (In fact, there are good arguments against encouraging it, and promoting the single, CDN-cached, 6.5k version of Underscore instead.)

But let's put that aside. When you're ready with this PR, let's take a few representative functions, and you can also do them in the modularized form — and I'll take a stab at trying to satisfy Rollup’s internal logic to see if I can come close to matching them with the single-source file. And if I can't approach your results, we'll discuss it again.

jgonggrijp commented 4 years ago

As I said, it’s an aesthetic preference that — to me — lies quite close to the heart of Underscore’s raison d'être, and is a matter of prioritization. To me, keeping Underscore a single file that can be copied, read through in a sitting, and used without any additional tooling, is very important, and a large part of the library’s original usefulness and popularity.

Alright. I can really relate to this. But this is not lost; the generated UMD bundle can be copied, read through in a single sitting and used without any additional tooling. Only the developer is seeing a change in this regard.

Whether or not we support tree-shaking at all is much less important to me. (In fact, there are good arguments against encouraging it, and promoting the single, CDN-cached, 6.5k version of Underscore instead.)

It may surprise you, but I fully agree with this. If you're using substantial portions of Underscore client side, a monolithic CDN embed is far better than a subset included in your own (probably already heavy) application bundle. Because of the caching, as you said, and also because requests to different hosts can run in parallel. I always use a mix of jsDelivr and cdnizer for this same reason.

On the other hand, there definitely are valid use cases for treeshaking. I can imagine a client side application that makes heavy use of _.partial but no other Underscore functions, or a server side application that needs to make do with very limited working memory.

But let's put that aside. When you're ready with this PR, let's take a few representative functions, and you can also do them in the modularized form — and I'll take a stab at trying to satisfy Rollup’s internal logic to see if I can come close to matching them with the single-source file. And if I can't approach your results, we'll discuss it again.

Fine by me. I'm about to push my finished work, stay tuned.

jgonggrijp commented 4 years ago

@jashkenas I rebased on the latest master, ensuring to fix merge conflicts and include my own _.iteratee fix, added some more commits and then force-pushed. The changes since your last review:

In renaming the modules, I deviated a bit from the options we discussed before.

Please let me know whether you find this acceptable.

jgonggrijp commented 4 years ago

Edit: the comment below is outdated. The UMD bundle is now linted to check that its syntax is compatible with ES3. There is a fix for ExtendScript in #2831 that will continue to work when this branch is merged.

I almost forgot, regarding the following:

  • It would be nice if the UMD build of Underscore continued to support ES3 — there are a lot of folks who use Underscore, for example, in Adobe Illustrator and Photoshop under ExtendScript. It would be even better if we could find a way to assert that compatibility in the tests.

I checked, and the UMD bundle produced by rollup is ES3. I did not find an easy way to test for compatibility with ExtendScript, but I'm thinking that it might be possible to have eslint check for this on the UMD bundle. I'm working on resolving #2827 separately and I hope I will be able to do it in a way that continues to work after the rollup conversion.

jgonggrijp commented 4 years ago

Oh, and another thing I forgot to mention. I did not move the minified version of the UMD bundle to a dist folder, as you suggested, because it would break the embed lines for everyone who takes them from CDN.

Also, I did not commit the generated UMD bundle. I think we still need to decide on a policy for this. Maybe never commit it, just include it when publishing to NPM? This can be automated with the prepare or prepublishOnly script in the package.json. We could do the same for the minified version. If going this route, the minified bundle and its sourcemap should be removed from git and then all generated files should be added to the .gitignore.

jgonggrijp commented 4 years ago

@jashkenas

When you're ready with this PR, let's take a few representative functions, and you can also do them in the modularized form — and I'll take a stab at trying to satisfy Rollup’s internal logic to see if I can come close to matching them with the single-source file. And if I can't approach your results, we'll discuss it again.

Apart from the need to make definitive decisions about module names and a UMD bundle commit policy, I think this PR is finished (see https://github.com/jashkenas/underscore/pull/2826#issuecomment-598776404 in case you missed the edit). So I'd like to agree on some game rules for this "treeshaking challenge".

Most importantly, I'd like to settle a specific date on which we'll decide whether you were able to approach my results or not. I want this in order to prevent the decision from being postponed indefinitely. On this date, the "winner" should publish his code to prove his claims.

I suggest that you grant yourself somewhere in the order of 24 hours of total programming time to work on this and then pick the nearest date before which you're confident you can spend that much time. 24 hours is approximately the total time it took me to create a noncircular modular version in which every function lives in a separate module.

Furthermore,

I already have modularized versions locally, with and without circular dependencies, so I'm ready to test treeshaking performance on any function you choose.

Right now, the es-module branch is set up for measuring the treeshaking performance on _.map. You can measure the performance by running npm run bundle-treeshake (this installs and runs the latest version of rollup automatically using npx) and then counting the number of lines in test-treeshake/map-umd.js. We can extend this to other functions if you like.

At the time of writing, I'm measuring the following performance.

Version Size of _.map in lines of code
monolithic (master branch, no treeshaking) 1696
semimonolithic (es-module branch) 821
modular with circular dependencies (unpublished modularized branch) 350
modular without circular dependencies (unpublished noncircular branch) 276

(The test-treeshake/map-umd.js generated in the noncircular version appears to contain almost exclusively code that _.map actually depends on, so it probably can't be much smaller than 276 lines.)

I suggest that you win if, by the deadline, you manage to get the size of _.map within 10% of the circular modular version. So that would be anything under 385 lines of code. Otherwise, I win, and we discuss further modularization again as you proposed. We can extend this 10% margin to other functions as well.

jgonggrijp commented 4 years ago

@jashkenas My apologies for the previous comment, it was too early to discuss game rules and I was being a bit rude as well.

May I suggest that we proceed in the following order:

  1. Merge (or reject) #2831 and #2830.
  2. Merge the current PR. See https://github.com/jashkenas/underscore/pull/2826#issuecomment-598776404, https://github.com/jashkenas/underscore/pull/2826#issuecomment-598815168. I made another comment between those about ES3 compatibility, but the latest state is different: the UMD bundle is linted to check that all syntax is ES3 compatible and I have a permanent fix for ExtendScript in #2831.
  3. Discuss game rules for the treeshake challenge, using https://github.com/jashkenas/underscore/pull/2826#issuecomment-599089184 as a starting point, pretending that it was entirely formulated in a civil way.
jashkenas commented 4 years ago

No worries! I didn't take any offense — and I think that most of your game rules sound fine. I just didn't want to commit to a particular date without knowing when I would be able to look at this further.

Yesterday, they announced the closure of the borders here in Chile, and we’ve been scrambling around weighing the pros and cons of cancelled flights and shelter in place orders to try to get back home. Hopefully things will quiet down after we manage to make it back.

jgonggrijp commented 4 years ago

No problem, best of luck getting home!

jgonggrijp commented 4 years ago

@jashkenas I'm mostly asking this out of personal concern: how are the circumstances for you and your family?

jashkenas commented 4 years ago

We’re good! When they announced the impending border closure, and our Aeromexico flight was cancelled, we turned around, drove 6 hours back north to Santiago, and grabbed a flight (one of the last?) back home to go hunker down.

jgonggrijp commented 4 years ago

@jashkenas I'd really like to move on with this PR. Could you give me a status update?

jashkenas commented 4 years ago

I went back to work on Monday, and haven't had a chance to take a look at this again ... but I'll try to soon. Sorry for the slowness.

jashkenas commented 4 years ago

Alright! I took a good long crack at playing around with Rollup’s treeshaking this morning, and feel like a learned a lot about what kinds of patterns it can and can’t treeshake, how /*@__PURE__*/ doesn’t work, and the types of things that we do in basic Underscore that end up being included in the bundle if they're present in the imported module, regardless of use.

For example, here’s a pattern that Rollup can tree-shake away (it’s a reduced test case of our isArguments function):

function _toString() {}
export var isArguments = (function() {
  return _toString(arguments) === '[object Arguments]' ? 0 : 1;
}());

And here’s the version we need to use, that Rollup can’t:

function _toString() {}
export var isArguments = (function() {
  return _toString.call(arguments) === '[object Arguments]' ? 0 : 1;
}());

The difference in this case is .call().

Flowing through functions, this means that anything that depends on Object.prototype.toString.call(...) — which we use heavily to test types, can't be treeshaken if called at the top level, even if not exported directly.

Many of the other places where we call functions at the top level can't be treeshaken by Rollup, although some simpler patterns can.

So, what does that mean for this PR?

Ultimately, although I'd be happy to publish a version of Underscore that provides both ES Module and UMD flavors, I think that the module should be a monolithic one, given the small size of Underscore, its presence in the CDNjs cache, and the current state of treeshaking tooling. I'm still not interested in either chopping it up into tiny files, or in littering the codebase with /*@__PURE__*/ annotations (even if they worked, which they don’t) in order to satisfy Rollup’s current limitations.

That said, if you want to publish your fully modularized version as a different npm package, godspeed. I'd be happy to link to it from the top of the Underscore homepage. I hope that's not too disappointing.

jgonggrijp commented 4 years ago

Alright! I took a good long crack at playing around with Rollup’s treeshaking this morning, and feel like a learned a lot about what kinds of patterns it can and can’t treeshake, how /*@__PURE__*/ doesn’t work, and the types of things that we do in basic Underscore that end up being included in the bundle if they're present in the imported module, regardless of use.

You surprise me! I was prepared to wait a long time before you'd even announce a deadline for the treeshaking challenge, and now you've already completed the challenge. You did a really good job at it as well.

So, what does that mean for this PR?

Ultimately, although I'd be happy to publish a version of Underscore that provides both ES Module and UMD flavors, I think that the module should be a monolithic one, given the small size of Underscore, its presence in the CDNjs cache, and the current state of treeshaking tooling. I'm still not interested in either chopping it up into tiny files, or in littering the codebase with /*@__PURE__*/ annotations (even if they worked, which they don’t) in order to satisfy Rollup’s current limitations.

That said, if you want to publish your fully modularized version as a different npm package, godspeed. I'd be happy to link to it from the top of the Underscore homepage.

That's really nice of you, but no, I don't want to do that because it would amount to forking Underscore. I believe the world doesn't need yet another fork of Underscore. I only want Underscore to be modular if you are on board.

I hope that's not too disappointing.

If the current, semi-monolithic PR can still be merged, I'm happy enough. How about that?

And I'll try again later to interest you in chopping everything into tiny modules. ;-)

jashkenas commented 4 years ago

Okay, great! In that case I think that we’re almost there.

At least before I can merge it — and perhaps as a general policy — we need to provide a built version of underscore.js in the repo. For many years, there has been a link to test out the "Edge version" on the homepage: https://raw.github.com/jashkenas/underscore/master/underscore.js ... and it would be great if that could continue to work, as changes are made to the master branch.

So perhaps using Husky to define a pre-commit hook that makes sure that underscore.js is always kept up-to-date with modules/index.js ?

And then prepublish can handle the -min.js and -min.js.map variants, and the annotated source.

As to the actual roll-out, how would you like to handle it? I think this could merit a 2.0 release, with appropriate revisions to the homepage/documentation. Would you like to be added as a collaborator on this repo to help make that happen? I think you’ve clearly earned your stripes.

jashkenas commented 4 years ago

And as to the treeshaking — my experiments make me feel like (even with a single source file), it's tantalizingly close. Rollup is able to eliminate all of the obvious non-dependencies, in terms of function declarations at the top level.

It's only with our use of function calls at the top level, in order to define other functions, that it has problems, and even then, only some of the time.

Things like:

export var omit = restArguments(function(obj, _keys) {
  var iteratee = _keys[0], context;
  if (isFunction(iteratee)) {
    iteratee = negate(iteratee);
    if (_keys.length > 1) context = _keys[1];
  } else {
    _keys = map(_flatten(_keys, false, false), String);
    iteratee = function(value, key) {
      return !contains(_keys, key);
    };
  }
  return pick(obj, iteratee, context);
});

... because of the call to restArguments, even though that function obviously has no side effects.

I feel like there might be a more general pattern of dynamic function export creation we could find that would allow Rollup to treat each export in isolation ... but I wasn't able to discover it, despite my best efforts.

jgonggrijp commented 4 years ago

Okay, great! In that case I think that we’re almost there.

Hooray!

At least before I can merge it — and perhaps as a general policy — we need to provide a built version of underscore.js in the repo. For many years, there has been a link to test out the "Edge version" on the homepage: https://raw.github.com/jashkenas/underscore/master/underscore.js ... and it would be great if that could continue to work, as changes are made to the master branch.

So perhaps using Husky to define a pre-commit hook that makes sure that underscore.js is always kept up-to-date with modules/index.js ?

Sure, I'll look into it.

And then prepublish can handle the -min.js and -min.js.map variants, and the annotated source.

prepublish or prepublishOnly? I don't mind either, but in the first case, I'd prefer to call it prepare.

As to the actual roll-out, how would you like to handle it? I think this could merit a 2.0 release, with appropriate revisions to the homepage/documentation.

Regardless of the version number, I definitely think some additions to the documentation would be in order.

I really appreciate that you want to celebrate this with a 2.0 release, but I adhere to semver quite strictly and I think this should be 1.10. All of the old interface is still there, all of the backwards compatibility is still there and the source code still looks mostly the same. Users just get the new option to treeshake with rollup. I don't want to "shock" users with a 2.0 if it isn't actually disruptive.

If, in some hypothetical future, we chop everything into tiny modules, I think that might merit a 2.0 release though... ;-)

Would you like to be added as a collaborator on this repo to help make that happen? I think you’ve clearly earned your stripes.

That's an honour, thank you. I definitely want to help. If making me a collaborator helps to make me help, I'm all for it. :-)

As for your second comment about treeshaking: I feel your frustration! However, keep in mind that the size of _.map is already cut in half. That's already a pretty big win. There are probably functions that are treeshaken even more as well.

jashkenas commented 4 years ago

prepublish or prepublishOnly?

Right, prepublishOnly is what I meant.

I really appreciate that you want to celebrate this with a 2.0 release, but I adhere to semver quite strictly and I think this should be 1.10.

Okay, 1.10 it is.

That's an honour, thank you. I definitely want to help. If making me a collaborator helps to make me help, I'm all for it. :-)

Done and done.

As for your second comment about treeshaking: I feel your frustration! However, keep in mind that the size of _.map is already cut in half. That's already a pretty big win. There are probably functions that are treeshaken even more as well.

From my understanding, not so much! Most of the half of the codebase that is currently included along with treeshaken map does not have anything to do with map directly (or indirectly!). It's just all of the top-level function calls that Rollup cannot determine are side-effect-free, and everything that those top-level function calls depend on. All of that same code will be included in every treeshaken individual export, no matter which one it is. And if we can restructure those top-level function calls to treeshake cleanly, conversely, all Underscore functions should export cleanly from the single file.

jgonggrijp commented 4 years ago

@jashkenas I think we're ready for the merge. I can now push the merge button by myself, but I'll give you a chance to review the new commits first.

Regarding the Husky hooks, I had to explicitly invoke git add inside the pre-commit hook to ensure that the UMD bundle would actually be committed. Apparently, doing this inside a pre-commit hook causes the autogenerated files to still appear in the diff and the index after git commit completes. To fix this, I had to add an explicit git reset in the post-commit hook as well.

How to continue from here? Merge the current PR first, then start a new branch in your repo to update the documentation?

jgonggrijp commented 4 years ago

Given your approval, I went ahead and pushed the merge button. Hope you don't mind.

This is a big milestone. Thanks so much!