Closed jgonggrijp closed 4 years ago
@jashkenas I'm done writing. I would be very grateful if you could find the time to review this. My apologies for the long opening post and the big diff (despite my best attempts to keep the diff minimal)!
@lohfu @mjeanroy FYI.
Wow! 👏👏👏.
Thanks for taking the time to actually make a stab at this — many people have talked a lot about it over the years, but it takes a rare soul to follow through.
I'm actually starting a month of parental leave next week, so should have a good chunk of time to look at this more closely soon. Please pester me if I don't get back to you soon enough...
@jashkenas It's a pleasure. I love this library, thank you for making it. Good luck with parenting!
@jashkenas Any updates?
Not yet, and we have a wedding to attend this weekend, but thanks for the ping!
On Fri, Mar 6, 2020, 2:45 PM Julian Gonggrijp notifications@github.com wrote:
@jashkenas https://github.com/jashkenas Any updates?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jashkenas/underscore/pull/2826?email_source=notifications&email_token=AAABE7HAYZNI36TQINZRIATRGEZBVA5CNFSM4K5GBDFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOCHDCA#issuecomment-595882376, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABE7AN3M7EA5PYO3M7FZLRGEZBVANCNFSM4K5GBDFA .
@jashkenas Here's another ping. My apologies if this is too soon. I hope you had a nice weekend and wedding.
I'd like to draw your attention to #2829 as well. If you approve of that one, I'd prefer it to be merged before this one. #2829 is a very small change.
On another note, would you appreciate some help with triaging open issues and PRs? Rather than spamming your notifications by responding to each of them individually, I could create a single issue ticket with a list of PRs than can be closed, a list of issues that can be closed, a list of PRs that require review, etcetera. If you like.
This is looking truly great. Thank you for taking the time to create such a solid draft of an ES-modularization. I really just have a few minor notes around the edges:
It would be nice if the UMD build of Underscore continued to support ES3 — there are a lot of folks who use Underscore, for example, in Adobe Illustrator and Photoshop under ExtendScript. It would be even better if we could find a way to assert that compatibility in the tests. If we change this in the future, and start using const
, async
and friends, it should be part of a bigger push to provide more asynchronous/promise-based functional logic in Underscore.
As you indicate, let's kill the extra level of indentation.
I'd prefer getting rid of the additional underscore-module.js
file, and just including the _.name = function
assignment in the main source file. It certainly is a neat trick though.
Naming: Since this change will become Underscore 2.0, I think it's fine for the main "underscore.js" to switch from being UMD formatted to a clean ES module. My preferred naming scheme would be:
underscore.js
The main source ES module, which is also the module that should be directly imported when using ES modules. (No separate -module version.)underscore-umd.js
If it would be possible to eliminate this file by having Rollup read from underscore.js
directly (now that it includes the correct default export), that would be great. Otherwise, we can leave this file in place.dist/underscore-min.js
Minified UMDdist/underscore-min.js.map
It would be nice if the internal _names
followed a consistent pattern (regardless of frequency of use), so: name
for exports and _name
for internal functions only.
It looks like it may be possible to have more current versions of Rollup avoid the use of getters by using the --no-externalLiveBindings
option, which will make things IE8 compatible.
Overall — and even if it looks like it makes life easier for Rollup — I'd still strongly prefer to keep everything in a single underscore.js
, ES module file. Perhaps there’s something we can do with respect to the code organization, or avoiding function hoisting, or circular dependencies, that can solve the treeshaken LOC problem in a different way.
Bravo! 👏
Thank you for taking the time to review this. I'm happy to hear it looks so good to you. This weekend, I'm going to try to address your concerns as well as I can. Some comments in advance:
- It would be nice if the UMD build of Underscore continued to support ES3 — there are a lot of folks who use Underscore, for example, in Adobe Illustrator and Photoshop under ExtendScript. It would be even better if we could find a way to assert that compatibility in the tests.
This is a tall order, but I agree with the sentiment in principle and I will investigate the options.
If we change this in the future, and start using
const
,async
and friends,
Please observe that this is by no means my intention, not even for a subsequent pull request. The current PR introduces no new syntax other than import
/export
. In my opinion, it is not necessary at this time to introduce other ES6+ syntax. In fact, not doing that has the advantage that no transpiler is needed.
it should be part of a bigger push to provide more asynchronous/promise-based functional logic in Underscore.
I'm not sure about this. I'm not sure more async belongs in underscore at all, because there is already a perfectly fine library available for that purpose. I'm not sure such an introduction would require modern async
/await
syntax, either, since you can work with promises without that syntax, too. And finally, the syntax used in the source code bears little relevance to the functionality available to the end user, since she will always see a version of the library that is encoded in old syntax, either as written by hand or transpiled down from modern syntax.
- I'd prefer getting rid of the additional
underscore-module.js
file, and just including the_.name = function
assignment in the main source file. It certainly is a neat trick though.
Why do you want to keep everything in one file?
The purpose of the module/source divide is not just avoiding repetition, although I think there is great value in DRY. It is also the only way to enable any treeshaking at all. I realize I didn't emphasize this before, but treeshaking only works when the user imports from the underscore-source, not when importing from the underscore-module. Treeshaking works when importing from the underscore-source, exactly because it doesn't contain the _.name = function
assignments; those assignments constitute side effects and they force tools like rollup to always include all of the source.
I'll be frank: there is no point in accepting this pull request unless you resign to the idea of splitting the source into at least two modules.
However, I'm very certain that you have valid, well-thought reasons for wanting to keep everything together. I want to fully understand these reasons and I want to address them to the best of my ability. Do they perhaps include the wonderful annotated source edition that you can read through in one go? Name all of it, and I will fight dragons so we can keep as many of the advantages of a single module as possible.
- Naming: Since this change will become Underscore 2.0,
Why will it? Isn't this mostly new functionality rather than a radical interface change?
I think it's fine for the main "underscore.js" to switch from being UMD formatted to a clean ES module. My preferred naming scheme would be:
underscore.js
The main source ES module, which is also the module that should be directly imported when using ES modules. (No separate -module version.)
As I wrote above, this wish is impossible to grant while also having treeshaking. In addition to that, if you continue to call the UMD bundle underscore.js
and use some other name for the main ESM entry point, nobody will need to change the path from which they import underscore, not even people who use ESM imports, because the module
field of the package.json
can transparently point them to the correct underlying file.
underscore-umd.js
If it would be possible to eliminate this file by having Rollup read fromunderscore.js
directly (now that it includes the correct default export), that would be great. Otherwise, we can leave this file in place.
Unfortunately, no, that file cannot be eliminated. The problem is not in the way the default export is arrived at (I could have done mixin(allExports) ; export { default } from 'underscore-source'
instead), but in the way rollup treats files that contain both named exports and a default export. It insists on setting module.exports
to a plain object in that case, which means that if people do
var _ = require('underscore');
, their _
is not anymore the OOP chain wrapper/partial placeholder/iteratee override holder it should be. For that, they would need to use
var _ = require('underscore').default;
instead. The only way to make the first scenario work, which I really think it should, is to offer rollup a module that has a default export and no other exports.
- It would be nice if the internal
_names
followed a consistent pattern (regardless of frequency of use), so:name
for exports and_name
for internal functions only.
I do agree with this and I will do as you suggest.
- It looks like it may be possible to have more current versions of Rollup avoid the use of getters by using the
--no-externalLiveBindings
option, which will make things IE8 compatible.
No, actually I tried that first, with rollup version 1.31.1. It still insisted on creating getters for some of the exports, probably because they were aliases of other exports. The only way to get rid of the getters entirely was the legacy
option. This also indicates a potential problem with rollup; they don't seem very committed to supporting legacy systems.
I think there are three main options:
Object.freeze
(both of which are only lacking because of the legacy
option), the generated code is identical to what recent versions of rollup generate.
- Overall — and even if it looks like it makes life easier for Rollup — I'd still strongly prefer to keep everything in a single
underscore.js
, ES module file. Perhaps there’s something we can do with respect to the code organization, or avoiding function hoisting, or circular dependencies, that can solve the treeshaken LOC problem in a different way.
Let me begin by repeating that I believe you have good reasons for this and that I want to address those reasons as well as I can. Please do explain those reasons to me in detail.
Rollup (and the hard problem of JavaScript treeshaking in general) is mainly sensitive to two things: side effects and dependency. Apart from that, the way the code is organized does not really affect its resuts. Whether one function goes before the other or the other way round does not matter, as long as neither depends on the other or writes to an object that the other might read from. Especially the question of side effects quickly becomes intractable as more common variables are involved. This is why treeshaking becomes more effective as the size of modules decreases: modules have isolated namespaces, so the finer the modularization, the less potential common state.
I truly believe that further modularization, up to the point of putting every function in a separate module, would be good for underscore. I didn't do it in this PR because I believe in evolutionary change and because I wanted to keep the review as digestible as possible. But I believe it should be the next step.
Apart from dramatically improving treeshaking, splitting out the source code into many small modules has a few other clear advantages:
preserveModules: true
option to rollup, setting the browser
field in the package.json for browserify users and setting a single path alias on the end of AMD users.By extension of the above, I believe that fine modularization goes before any attempt at solving circular dependencies.
Bravo! 👏
Thank you!
Thanks for the detailed response!
Please observe that this is by no means my intention, not even for a subsequent pull request. The current PR introduces no new syntax other than
import
/export
. In my opinion, it is not necessary at this time to introduce other ES6+ syntax. In fact, not doing that has the advantage that no transpiler is needed.
👍
I'm not sure about this. I'm not sure more async belongs in underscore at all, because there is already a perfectly fine library available for that purpose.
👍
I think there are three main options:
- Accept that we need to lag behind the rollup releases.
👍 That sounds fine. I'm sorry to hear that the --no-externalLiveBindings
doesn't work as described.
I'll be frank: there is no point in accepting this pull request unless you resign to the idea of splitting the source into at least two modules.
That’s fine, if it's necessary. If we end up with one file as the straight ES Module exports, one as the complete mixed API, and one file as the single-object UMD export — as you have it today — that's cool with me.
However, I'm very certain that you have valid, well-thought reasons for wanting to keep everything together. I want to fully understand these reasons and I want to address them to the best of my ability.
Simply because it's the soul of this (now venerable) project. Underscore has always tried — and has succeeded and failed over the years, as waves of pull requests have come and gone — to be a simply-written, single-file tour through some useful functional programming idioms in JavaScript. It’s an exercise that’s supposed to be as pedagogical as it is practical. There’s a balance between performance and organizational complexity on the one hand, and readability and source code simplicity on the other. Being a short, single-file library with no dependencies is at the heart of that identity. Also, Lodash already exists.
But I don't think that should be a dealbreaker — since, if we separate the -module
or -source
file as you suggest, it should make no difference to Rollup, or any other tool, if the source code happens to live in one file or many. It should be possible to syntactically determine where the seams lie between the functions, and achieve equal (or close to equal) results. And if they're not, it's either a hidden circular dependency (that we can fix), or a bug in Rollup (that can be fixed). Right?
If there are any other open questions you feel like I haven't addressed, please let me know.
If we end up with one file as the straight ES Module exports, one as the complete mixed API, and one file as the single-object UMD export — as you have it today — that's cool with me.
Great!
... reasons for wanting to keep everything together.
Simply because it's the soul of this (now venerable) project. Underscore has always tried — and has succeeded and failed over the years, as waves of pull requests have come and gone — to be a simply-written, single-file tour through some useful functional programming idioms in JavaScript. It’s an exercise that’s supposed to be as pedagogical as it is practical. There’s a balance between performance and organizational complexity on the one hand, and readability and source code simplicity on the other.
I know what you mean. Underscore is a very elegant library, venerable indeed. While I had read the docco rendering before, I became more intimately familiar with the code while preparing this PR and it only strengthened my opinion. It's like a good, well-paced story in which everything comes full circle at the end. With enlightening ingenuity.
Being a short, single-file library with no dependencies is at the heart of that identity. Also, Lodash already exists.
I full-heartedly agree about the succinctness and the self-sufficiency. As far as I'm concerned, those can and should stay. I also see the value of the "single read", but I think there might be other ways to address that than necessarily keeping all the primary source code in one file. Maybe it is enough if we generate a monolithic, well ordered ESM using rollup and then feed that into docco. To me, it seems like that might actually be the best of both worlds. Underscore could at the same time present a holistic story and also be a prime example of how to write elegant modular code.
As for Lodash. I'm a bit hurt by the remark. I don't believe that modularity is some kind of gliding scale that separates Lodash from Underscore, or even that it is the most significant difference between the libraries. As far as I'm concerned, Lodash is what it is for very different reasons. And I'm definitely not going to repeat that history. To begin with, I will not fork Underscore under any condition.
But I don't think that should be a dealbreaker — since, if we separate the
-module
or-source
file as you suggest, it should make no difference to Rollup, or any other tool, if the source code happens to live in one file or many. It should be possible to syntactically determine where the seams lie between the functions, and achieve equal (or close to equal) results. And if they're not, it's either a hidden circular dependency (that we can fix), or a bug in Rollup (that can be fixed). Right?
Sorry, but no, wrong. Observe that there is a factor two treeshaking difference between the underscore-source from the current PR and the experimental, fully modularized version that I didn't publish, despite both containing the exact same code with the same circular dependencies and despite me using the exact same version of rollup in both cases.
As a matter of fact, it makes a great deal of difference whether the source code lives in one file or many. I can see why you don't want this to be true, ideologically speaking, but it is true nonetheless. Rollup is over its neck in the murky business of trying to statically track side effects in an impure, weakly typed dynamic language, while Underscore is a little paradise of functional, orthogonal style. Getting a factor two treeshaking improvement by changing Underscore is many orders of magnitude easier than getting that same improvement by changing Rollup. I know; I got a factor four treeshaking improvement in less than a week of editing Underscore (a factor two by splitting it in two and then another factor two by splitting it in 146), working all alone, while the Rollup team has been working on Rollup for years and they never managed such a dramatic treeshaking improvement on anything whatsoever.
I'm not just talking pragmatics. There are strong, fundamental reasons for this. In 2050, ES code distributed over many files will still be more effectively treeshaken than monolithic source code. I'll bet on that.
Besides that, hidden circular dependencies are uncovered by splitting everything in tiny modules and then having a tool tell you where the circularities are. That, or many hours of brain-melting static analysis by a human (assuming the human won't give up). I don't know about you, but I'm not going to repeat that effort every time we suspect there might be a circular dependency (which we might not always suspect). Given the choice, I'd rather have it modularized permanently and get an instant hint from a tool whenever somebody introduces a circularity.
If there are any other open questions you feel like I haven't addressed, please let me know.
Yes. I don't fully understand yet why you feel that modularity is at odds with the elegance of Underscore. So far I see no conflict, only harmony. What makes you feel different?
As I said, it’s an aesthetic preference that — to me — lies quite close to the heart of Underscore’s raison d'être, and is a matter of prioritization. To me, keeping Underscore a single file that can be copied, read through in a sitting, and used without any additional tooling, is very important, and a large part of the library’s original usefulness and popularity. Whether or not we support tree-shaking at all is much less important to me. (In fact, there are good arguments against encouraging it, and promoting the single, CDN-cached, 6.5k version of Underscore instead.)
But let's put that aside. When you're ready with this PR, let's take a few representative functions, and you can also do them in the modularized form — and I'll take a stab at trying to satisfy Rollup’s internal logic to see if I can come close to matching them with the single-source file. And if I can't approach your results, we'll discuss it again.
As I said, it’s an aesthetic preference that — to me — lies quite close to the heart of Underscore’s raison d'être, and is a matter of prioritization. To me, keeping Underscore a single file that can be copied, read through in a sitting, and used without any additional tooling, is very important, and a large part of the library’s original usefulness and popularity.
Alright. I can really relate to this. But this is not lost; the generated UMD bundle can be copied, read through in a single sitting and used without any additional tooling. Only the developer is seeing a change in this regard.
Whether or not we support tree-shaking at all is much less important to me. (In fact, there are good arguments against encouraging it, and promoting the single, CDN-cached, 6.5k version of Underscore instead.)
It may surprise you, but I fully agree with this. If you're using substantial portions of Underscore client side, a monolithic CDN embed is far better than a subset included in your own (probably already heavy) application bundle. Because of the caching, as you said, and also because requests to different hosts can run in parallel. I always use a mix of jsDelivr and cdnizer for this same reason.
On the other hand, there definitely are valid use cases for treeshaking. I can imagine a client side application that makes heavy use of _.partial
but no other Underscore functions, or a server side application that needs to make do with very limited working memory.
But let's put that aside. When you're ready with this PR, let's take a few representative functions, and you can also do them in the modularized form — and I'll take a stab at trying to satisfy Rollup’s internal logic to see if I can come close to matching them with the single-source file. And if I can't approach your results, we'll discuss it again.
Fine by me. I'm about to push my finished work, stay tuned.
@jashkenas I rebased on the latest master
, ensuring to fix merge conflicts and include my own _.iteratee
fix, added some more commits and then force-pushed. The changes since your last review:
_name
s only for internal names, as agreed. This necessitated aliasing the builtin isNaN
and isFinite
to _isNaN
and _isFinite
in the setup section. This change can be seen in isolation here: https://github.com/jashkenas/underscore/pull/2826/commits/6ae7bac4c9420f56f2fe9565ed4c41092ed18578.In renaming the modules, I deviated a bit from the options we discussed before.
underscore-source.js
became modules/index.js
. This follows the convention that index.js
is the module where you can find "everything". People who wish to treeshake will be importing from underscore/modules/index
. If we further split this module in the future, that import path will continue to work because it will become the place where we collect all the functions.underscore-umd.js
became modules/index-default.js
. This is still the entry point for the UMD bundle; people will not be specifying this path explicitly in their imports.underscore-module
became modules/index-all.js
. This is now a module that re-exports the default from index-default
plus all the named exports from index
, hence all
. People who use ESM will automatically get this module when they import ... from 'underscore'
.Please let me know whether you find this acceptable.
Edit: the comment below is outdated. The UMD bundle is now linted to check that its syntax is compatible with ES3. There is a fix for ExtendScript in #2831 that will continue to work when this branch is merged.
I almost forgot, regarding the following:
- It would be nice if the UMD build of Underscore continued to support ES3 — there are a lot of folks who use Underscore, for example, in Adobe Illustrator and Photoshop under ExtendScript. It would be even better if we could find a way to assert that compatibility in the tests.
I checked, and the UMD bundle produced by rollup is ES3. I did not find an easy way to test for compatibility with ExtendScript, but I'm thinking that it might be possible to have eslint check for this on the UMD bundle. I'm working on resolving #2827 separately and I hope I will be able to do it in a way that continues to work after the rollup conversion.
Oh, and another thing I forgot to mention. I did not move the minified version of the UMD bundle to a dist
folder, as you suggested, because it would break the embed lines for everyone who takes them from CDN.
Also, I did not commit the generated UMD bundle. I think we still need to decide on a policy for this. Maybe never commit it, just include it when publishing to NPM? This can be automated with the prepare
or prepublishOnly
script in the package.json. We could do the same for the minified version. If going this route, the minified bundle and its sourcemap should be removed from git and then all generated files should be added to the .gitignore
.
@jashkenas
When you're ready with this PR, let's take a few representative functions, and you can also do them in the modularized form — and I'll take a stab at trying to satisfy Rollup’s internal logic to see if I can come close to matching them with the single-source file. And if I can't approach your results, we'll discuss it again.
Apart from the need to make definitive decisions about module names and a UMD bundle commit policy, I think this PR is finished (see https://github.com/jashkenas/underscore/pull/2826#issuecomment-598776404 in case you missed the edit). So I'd like to agree on some game rules for this "treeshaking challenge".
Most importantly, I'd like to settle a specific date on which we'll decide whether you were able to approach my results or not. I want this in order to prevent the decision from being postponed indefinitely. On this date, the "winner" should publish his code to prove his claims.
I suggest that you grant yourself somewhere in the order of 24 hours of total programming time to work on this and then pick the nearest date before which you're confident you can spend that much time. 24 hours is approximately the total time it took me to create a noncircular modular version in which every function lives in a separate module.
Furthermore,
modules/index.js
any further, so you will not do this, not even temporarily in order to obtain information.es-module
branch (i.e., a3f6e76), since my modularized versions don't introduce breaking changes, either. (Edit: in fact, they don't introduce any interface changes to the UMD bundle whatsoever.)I already have modularized versions locally, with and without circular dependencies, so I'm ready to test treeshaking performance on any function you choose.
Right now, the es-module branch is set up for measuring the treeshaking performance on _.map
. You can measure the performance by running npm run bundle-treeshake
(this installs and runs the latest version of rollup automatically using npx
) and then counting the number of lines in test-treeshake/map-umd.js
. We can extend this to other functions if you like.
At the time of writing, I'm measuring the following performance.
Version | Size of _.map in lines of code |
---|---|
monolithic (master branch, no treeshaking) |
1696 |
semimonolithic (es-module branch) |
821 |
modular with circular dependencies (unpublished modularized branch) |
350 |
modular without circular dependencies (unpublished noncircular branch) |
276 |
(The test-treeshake/map-umd.js
generated in the noncircular version appears to contain almost exclusively code that _.map
actually depends on, so it probably can't be much smaller than 276 lines.)
I suggest that you win if, by the deadline, you manage to get the size of _.map
within 10% of the circular modular version. So that would be anything under 385 lines of code. Otherwise, I win, and we discuss further modularization again as you proposed. We can extend this 10% margin to other functions as well.
@jashkenas My apologies for the previous comment, it was too early to discuss game rules and I was being a bit rude as well.
May I suggest that we proceed in the following order:
No worries! I didn't take any offense — and I think that most of your game rules sound fine. I just didn't want to commit to a particular date without knowing when I would be able to look at this further.
Yesterday, they announced the closure of the borders here in Chile, and we’ve been scrambling around weighing the pros and cons of cancelled flights and shelter in place orders to try to get back home. Hopefully things will quiet down after we manage to make it back.
No problem, best of luck getting home!
@jashkenas I'm mostly asking this out of personal concern: how are the circumstances for you and your family?
We’re good! When they announced the impending border closure, and our Aeromexico flight was cancelled, we turned around, drove 6 hours back north to Santiago, and grabbed a flight (one of the last?) back home to go hunker down.
@jashkenas I'd really like to move on with this PR. Could you give me a status update?
I went back to work on Monday, and haven't had a chance to take a look at this again ... but I'll try to soon. Sorry for the slowness.
Alright! I took a good long crack at playing around with Rollup’s treeshaking this morning, and feel like a learned a lot about what kinds of patterns it can and can’t treeshake, how /*@__PURE__*/
doesn’t work, and the types of things that we do in basic Underscore that end up being included in the bundle if they're present in the imported module, regardless of use.
For example, here’s a pattern that Rollup can tree-shake away (it’s a reduced test case of our isArguments
function):
function _toString() {}
export var isArguments = (function() {
return _toString(arguments) === '[object Arguments]' ? 0 : 1;
}());
And here’s the version we need to use, that Rollup can’t:
function _toString() {}
export var isArguments = (function() {
return _toString.call(arguments) === '[object Arguments]' ? 0 : 1;
}());
The difference in this case is .call()
.
Flowing through functions, this means that anything that depends on Object.prototype.toString.call(...)
— which we use heavily to test types, can't be treeshaken if called at the top level, even if not exported directly.
Many of the other places where we call functions at the top level can't be treeshaken by Rollup, although some simpler patterns can.
So, what does that mean for this PR?
Ultimately, although I'd be happy to publish a version of Underscore that provides both ES Module and UMD flavors, I think that the module should be a monolithic one, given the small size of Underscore, its presence in the CDNjs cache, and the current state of treeshaking tooling. I'm still not interested in either chopping it up into tiny files, or in littering the codebase with /*@__PURE__*/
annotations (even if they worked, which they don’t) in order to satisfy Rollup’s current limitations.
That said, if you want to publish your fully modularized version as a different npm package, godspeed. I'd be happy to link to it from the top of the Underscore homepage. I hope that's not too disappointing.
Alright! I took a good long crack at playing around with Rollup’s treeshaking this morning, and feel like a learned a lot about what kinds of patterns it can and can’t treeshake, how
/*@__PURE__*/
doesn’t work, and the types of things that we do in basic Underscore that end up being included in the bundle if they're present in the imported module, regardless of use.
You surprise me! I was prepared to wait a long time before you'd even announce a deadline for the treeshaking challenge, and now you've already completed the challenge. You did a really good job at it as well.
So, what does that mean for this PR?
Ultimately, although I'd be happy to publish a version of Underscore that provides both ES Module and UMD flavors, I think that the module should be a monolithic one, given the small size of Underscore, its presence in the CDNjs cache, and the current state of treeshaking tooling. I'm still not interested in either chopping it up into tiny files, or in littering the codebase with
/*@__PURE__*/
annotations (even if they worked, which they don’t) in order to satisfy Rollup’s current limitations.That said, if you want to publish your fully modularized version as a different npm package, godspeed. I'd be happy to link to it from the top of the Underscore homepage.
That's really nice of you, but no, I don't want to do that because it would amount to forking Underscore. I believe the world doesn't need yet another fork of Underscore. I only want Underscore to be modular if you are on board.
I hope that's not too disappointing.
If the current, semi-monolithic PR can still be merged, I'm happy enough. How about that?
And I'll try again later to interest you in chopping everything into tiny modules. ;-)
Okay, great! In that case I think that we’re almost there.
At least before I can merge it — and perhaps as a general policy — we need to provide a built version of underscore.js
in the repo. For many years, there has been a link to test out the "Edge version" on the homepage: https://raw.github.com/jashkenas/underscore/master/underscore.js
... and it would be great if that could continue to work, as changes are made to the master branch.
So perhaps using Husky to define a pre-commit hook that makes sure that underscore.js
is always kept up-to-date with modules/index.js
?
And then prepublish can handle the -min.js
and -min.js.map
variants, and the annotated source.
As to the actual roll-out, how would you like to handle it? I think this could merit a 2.0 release, with appropriate revisions to the homepage/documentation. Would you like to be added as a collaborator on this repo to help make that happen? I think you’ve clearly earned your stripes.
And as to the treeshaking — my experiments make me feel like (even with a single source file), it's tantalizingly close. Rollup is able to eliminate all of the obvious non-dependencies, in terms of function declarations at the top level.
It's only with our use of function calls at the top level, in order to define other functions, that it has problems, and even then, only some of the time.
Things like:
export var omit = restArguments(function(obj, _keys) {
var iteratee = _keys[0], context;
if (isFunction(iteratee)) {
iteratee = negate(iteratee);
if (_keys.length > 1) context = _keys[1];
} else {
_keys = map(_flatten(_keys, false, false), String);
iteratee = function(value, key) {
return !contains(_keys, key);
};
}
return pick(obj, iteratee, context);
});
... because of the call to restArguments
, even though that function obviously has no side effects.
I feel like there might be a more general pattern of dynamic function export creation we could find that would allow Rollup to treat each export in isolation ... but I wasn't able to discover it, despite my best efforts.
Okay, great! In that case I think that we’re almost there.
Hooray!
At least before I can merge it — and perhaps as a general policy — we need to provide a built version of
underscore.js
in the repo. For many years, there has been a link to test out the "Edge version" on the homepage:https://raw.github.com/jashkenas/underscore/master/underscore.js
... and it would be great if that could continue to work, as changes are made to the master branch.So perhaps using Husky to define a pre-commit hook that makes sure that
underscore.js
is always kept up-to-date withmodules/index.js
?
Sure, I'll look into it.
And then prepublish can handle the
-min.js
and-min.js.map
variants, and the annotated source.
prepublish
or prepublishOnly
? I don't mind either, but in the first case, I'd prefer to call it prepare
.
As to the actual roll-out, how would you like to handle it? I think this could merit a 2.0 release, with appropriate revisions to the homepage/documentation.
Regardless of the version number, I definitely think some additions to the documentation would be in order.
I really appreciate that you want to celebrate this with a 2.0 release, but I adhere to semver quite strictly and I think this should be 1.10. All of the old interface is still there, all of the backwards compatibility is still there and the source code still looks mostly the same. Users just get the new option to treeshake with rollup. I don't want to "shock" users with a 2.0 if it isn't actually disruptive.
If, in some hypothetical future, we chop everything into tiny modules, I think that might merit a 2.0 release though... ;-)
Would you like to be added as a collaborator on this repo to help make that happen? I think you’ve clearly earned your stripes.
That's an honour, thank you. I definitely want to help. If making me a collaborator helps to make me help, I'm all for it. :-)
As for your second comment about treeshaking: I feel your frustration! However, keep in mind that the size of _.map
is already cut in half. That's already a pretty big win. There are probably functions that are treeshaken even more as well.
prepublish
orprepublishOnly
?
Right, prepublishOnly
is what I meant.
I really appreciate that you want to celebrate this with a 2.0 release, but I adhere to semver quite strictly and I think this should be 1.10.
Okay, 1.10
it is.
That's an honour, thank you. I definitely want to help. If making me a collaborator helps to make me help, I'm all for it. :-)
Done and done.
As for your second comment about treeshaking: I feel your frustration! However, keep in mind that the size of
_.map
is already cut in half. That's already a pretty big win. There are probably functions that are treeshaken even more as well.
From my understanding, not so much! Most of the half of the codebase that is currently included along with treeshaken map
does not have anything to do with map
directly (or indirectly!). It's just all of the top-level function calls that Rollup cannot determine are side-effect-free, and everything that those top-level function calls depend on. All of that same code will be included in every treeshaken individual export, no matter which one it is. And if we can restructure those top-level function calls to treeshake cleanly, conversely, all Underscore functions should export cleanly from the single file.
@jashkenas I think we're ready for the merge. I can now push the merge button by myself, but I'll give you a chance to review the new commits first.
Regarding the Husky hooks, I had to explicitly invoke git add
inside the pre-commit hook to ensure that the UMD bundle would actually be committed. Apparently, doing this inside a pre-commit hook causes the autogenerated files to still appear in the diff and the index after git commit
completes. To fix this, I had to add an explicit git reset
in the post-commit hook as well.
How to continue from here? Merge the current PR first, then start a new branch in your repo to update the documentation?
Given your approval, I went ahead and pushed the merge button. Hope you don't mind.
This is a big milestone. Thanks so much!
Ready for review but not ready for merging; see "before merging" notes below.
Purpose
Reformatting the underscore source code to use ES6 exports, in order to facilitate treeshaking with tools such as rollup. See #2718.
Non-purposes
const
,let
,async
or arrow functions.I agree with and followed https://github.com/jashkenas/underscore/issues/2718#issuecomment-575725274 and https://github.com/jashkenas/underscore/issues/2718#issuecomment-575728627. One change at a time.
This is also one of the reasons this isn't ready for merge yet; the move to ES6 exports obviates one level of indentation, but I kept that indentation for now in order to ease review.
Update: a commit that removes the obsolete indentation was added to this PR on 2020-03-13.
What I did
underscore.js
tounderscore-source.js
, because the former will be generated from the latter as of this PR. The generatedunderscore.js
is functionally equivalent (except for some minor differences detailed below) to the formerunderscore.js
.underscore-source.js
is just a working name; see notes under "before merging".export
statements must be at module level and secondly because rollup will generate an IIFE for the UMD bundle.underscore-source.js
in the following ways:_.name = function(){};
byexport function name(){}
._.name = expression;
byexport var name = expression;
.var internalName = function(){};
byfunction internalName(){}
for consistency with 1. and also for brevity and a little added convenience (i.e., hoisting)._.name
by justname
, except for_.iteratee
and_.templateSettings
because users may override these._.
namespace for internal use introduced some naming conflicts, which I solved as follows:has
and an exportedhas
. I renamed the latter to_has
for internal use, because the non-exportedhas
is used more._has
also resembles the old_.has
.flatten
/_flatten
.isNaN
has a counterpart that is an ES builtin. I renamed the former to_isNaN
for internal use.isFinite
/_isFinite
.keys
is used very often as a local variable inside functions. I renamed the exportedkeys
to_keys
for internal use.now
is also the name of a local variable inthrottle
, which is the return value of the exportednow
. I renamed the local variable to_now
.underscore-module.js
(working name) which takes all the exports fromunderscore-source.js
and mixes them into_
. This is a trick to avoid hand-coding all the exports twice, i.e., to avoid having to write bothexport name
and_.name = name
.module
field in thepackage.json
to theunderscore-module.js
, so that tools like rollup will be able to find theexport
-based version while everyone else can continue using the UMD bundle.underscore-umd.js
(working name) that re-exports only the default export fromunderscore-module.js
. This facilicates the UMD bundling by rollup. Update 2020-03-26: in the meanwhile, the roles between the underscore-module and the underscore-umd has reversed. The latter is the module that mixes all functions into the_
object and the import order is underscore-source -> underscore-umd -> underscore-module.legacy
option, which prevents rollup from generating getters, which would be incompatible with IE8.noConflict
function to rollup, which has an option for this.scripts
) to thepackage.json
to generate the UMD bundles (both theunderscore.js
and two bundles for the treeshaking tests) and prepended these commands to other commands that depend on their existence.eslint
, and added a plugin, to enable the linter to processimport
/export
syntax.I did not convert the tests to ES module syntax; in fact, they still import from the (now generated)
underscore.js
.Update 2020-03-26: In the meanwhile, in addition to the above, I have done a few more things which are detailed in the comments below.
Interface changes
window
embed). However, it is a bit less refined than the former hand-written version. I don't expect it to cause issues, but I should still mention that it isn't 100% equivalent.noConflict
method generated by rollup is 100% equivalent to the fomer hand-written version. However, it is only added in thewindow
embed scenario. ESM, AMD and CommonJS module users will not see the method anymore, although they shouldn't need it, either.I believe that the published UMD interface is otherwise completely identical to what is currently on the master branch.
This PR shows a comparison between the former hand-written
underscore.js
and the new reformattedunderscore-source.js
. In order to appreciate the interface changes, one might want to see a comparison between the former hand-writtenunderscore.js
and the new rollup-generatedunderscore-js
instead. I pushed a secondary branch to make this possible. The diff can be found over here: https://github.com/jashkenas/underscore/compare/master...jgonggrijp:es-module-compiled?expand=1#diff-0f36b362a0b81d6f4d4bfd8a7413c75d (you may have to click "load diff").Quantitative changes
The amount of code to maintain decreases very slightly.
underscore.js
_source
+_module
+_umd
The weight of the published gzipped + minified UMD bundle increases by 240 bytes.
In exchange, users get the option to treeshake when using tools like rollup. If a user imports only
map
(fromunderscore-source.js
), underscore's contribution to her UMD bundle decreases by 900 lines of code compared to the present situation. This is substantial, if not as spectacular as one might expect. This is due to rollup not being clever enought about side effects and erring on the side of caution.As an experiment, I also created a completely modularized version where each function lives in a separate module (not published). This enabled rollup to shave off another 460 lines of code. Rollup also surprised me by pointing out circular dependencies, which it apparently cannot do if all code is in one module. I suspect that solving these circularities might cut
map
's size in half again, but I have not tried this.map
in lines of codeWhile the minified + gzipped UMD bundle weight has increased a bit, these numbers make me believe that using ES exports is definitely a move forward. In addition, solving circular dependencies might also decrease the weight again.
Update 2020-03-26: updated treeshake performance numbers can be found at the bottom of https://github.com/jashkenas/underscore/pull/2826#issuecomment-599089184.
Before merging
underscore.js
should probably be commited. What do you think would be the right policy for this, @jashkenas? Update 2020-03-26: in the meanwhile I have proposed to never commit it and instead ensure that it is uploaded to NPM by enforcing the build in theprepare
orprepublishOnly
hook of thepackage.json
.underscore-{source,module,umd}.js
. Perhapsmodules/{index-unmixed,index,index-umd-wrapper}.js
?Edit 2020-03-7: Changed suggested names from
modules/{underscore,index,umd}.js
tomodules/{index-unmixed,index,index-umd-wrapper}.js
because the former might be problematic if we modularize the whole library in a subsequent PR.Update 2020-03-27: slightly changed the above suggestion to
modules/{index,index-all,index-default}.js
, respectively. This suggestion was also implemented in a new commit that has been added to this PR.Edit to add on 2020-06-18: You can support my work on Underscore and other open source projects on Patreon.