package-community / discussions

GitHub-based discussions about package management, questions, answers, announcements
53 stars 2 forks source link

javascript: npm for web packages #2

Open justinfagnani opened 7 years ago

justinfagnani commented 7 years ago

@notwaldorf and @zkat might have some more context here, but I heard that npm was thinking about ways to make things better for web development. I'd love to hear what ideas are being discussed and contribute some of Polymer's use-cases and requirements if that's helpful.

Polymer's in the middle of a transition to npm from Bower. Mostly things are great except a few complications that arise out of our requirement to publish modules that work natively on the web.

Currently we're recommending the use of Yarn because it support flat installations. We need flat because web-standard JavaScript modules which can only import other modules by path. This means to import a dependency, even cross-package, we need to know exactly where that module lives relative to the importer in filesystem/URL space. Thus yarn and yarn install --flat.

This mostly works except that many tools can't be properly installed flat. To workaround that we've been structuring projects with a tools/ subfolder with it's own package.json. Then we need to wire up scripts so that tools are installed when working on the top-level project, npm run works well, etc.

There's also problem with just client-side packages when some have version conflicts among dependencies and can't be installed flat. While our packages really do need to be flat, not every package does, but flat in Yarn applies to an entire node_modules folder.

A few thing could possibly help here:

Those are just a few ideas we've had, I'm sure there's a lot of other ways to slice this problem. We're very motivated to help, not only to make our user's experience better, but to help create a relatively standard way to install web-compatible packages so that Polymer isn't going off on some unique workflow.

zkat commented 7 years ago

@justinfagnani I think, specially since we're talking about it over here, that it would be very interesting to imagine what a bespoke package manager just for Polymer would be like.

We've talked about having what we call an assets-specific package manager recently, though, and it looks something like this:

  1. It uses the same specifiers as npm does (aka, foo@1.2.3 will install from the registry, usr/proj#semver:^1.2 will install from git, etc)
  2. It installs all dependencies flat, to a predictable directory. That means that dependency hell is possible through conflicts. We have two mitigations planned for this: a. Use a port of Molinillo for resolving trees b. When this fails, do what bower does and manually pick which semver-incompatible version you're going to use, and this will be recorded.
  3. These dependencies, called "assets" for our purposes, will all be installed into a single, non-configurable directory: assets/.
  4. Assets can have dependencies, specified in package.json in the usual dependencies field that npm currently uses. This allows existing dependencies on the registry to be used.
  5. Assets are declared through a new assets field in package.json. This field will have the same syntax as dependencies.
  6. Assets specified in assets will ONLY be installed for the toplevel project. That is, if you add a dependency, and it specifies assets, they will be completely ignored, much like devDependencies, when installed as a dependency.
  7. For compatibility with existing packages, it's expected that any existing published libraries that declare dependencies using node-compatible require or import statements will go through a transformer either during bundling or at the webserver layer that will convert them to work on the web. That is: require('lodash') should be transformed to require('../lodash') prior to serving. Unless an actual module loader spec apparates itself and lets users do this client-side in a portable way.
  8. Lifecycle scripts will work during installation of assets exactly like they do with npm right now when installing regular dependencies.

So, looking at a Polymer example, based on its install guide -- and assuming this package manager is integrated directly into polymer-cli:

  1. $ npm i -D polymer-cli - install polymer as a devDep on your current project. This will make sure all your teammates are using the same version of the CLI. With npx and its auto-fallback enabled, this CLI can be invoked by just cd-ing to your project directory and typing $ polymer ... directly, much like one usually would if it were global. Without the fallback, $ npx polymer ... will do the trick, though.
  2. $ mkdir proj && cd proj && polymer init - do the usual initialization.
  3. $ polymer add @polymer/polymer - This adds "assets": { "polymer": "^2.0.0" } to package.json, as well as an entry to package-lock.json in an assets field separate from dependencies that describes the dep.
  4. $ polymer add @polymer-elements/app-layout - go ahead and add a nice new element to play with
  5. $ vim index.html - edit your html file and add the following to it:
    <head>
    <script src="/assets/webcomponentsjs/webcomponents-loader.js"></script>
    <link rel="import" href="/assets/polymer/polymer.html">
    <link rel="import" href="/assets/app-layout/app-layout.html">
    </head>
    <body>
    <app-header reveals>
    <app-toolbar>
      <!-- stuff -->
    </app-toolbar>
    </app-header>
    <app-drawer id="drawer" swipe-open></app-drawer>
    </body>
  6. $ polymer build - do the build! Assume it's going to do its work on assets/ instead of bower_components/
  7. $ polymer serve - make it so

I would like to think this feels right for your needs, and is not very different from the bower workflow -- except you can host everything on a registry, and also you can have build steps before publish using the usual prepare script approach: it means users would be able to use tools like rollup before actually publishing their components. This might save time and hassle. You might also notice that there is no extra tooling installation: all of your work can be done directly through a single devDependency. You don't even have to tell people to use a weird --flat option (which will mess with their regular dependencies).

N.B.: if you're not comfortable adding a couple of subcommands to polymer-cli, that's fine: it can be a separate tool too! npa is what we've been talking about calling the one we'll be developing for ourselves. The commands you'd need in polymer-cli would pretty much just be add and rm.

As far as your suggestions go, let me address them real quick:

dpogue commented 7 years ago

I was thinking about this sort of this a bit earlier today, particularly in the realm of browsers and ES6 modules and the need for those to be flat dependencies without node's auto-magic handling of the node_modules folder.

A similar, but not identical use case, is Cordova plugins distributing native source code for Swift/C#/ObjectiveC/Java in their npm packages and needing to ensure their plugins have a flat install. Getting two copies of the same code there results in duplicate symbol linking errors coming out of Xcode, and no JS developer (or any developer for that matter) wants to see those. Currently cordova-cli handles this case themselves (like you're suggesting polymer-cli could do).

What you're proposing looks good :+1:

zkat commented 7 years ago

I want to make another note about effort levels required for this: @mikesherov, with some help from npm CLI folks, is currently working on throwing together cipm. This is relevant and significant because a big chunk of that has involved extracting a bunch of code from npm into standalone libraries, and gluing them together. More, higher-level libraries might come out of this as we approach 1.0, and those will all reduce the effort needed when writing a whole new package manager from scratch. The goal here is that someone could toss together their own package manager using production-ready/production-proven tools like pacote in a single weekend.

The main missing piece once cipm is done is the Molinillo port because cipm doesn't have any actual tree-building capabilities, so we won't have that extracted. The rest is just finding pieces in cipm that work well together on their own and turning those into single-call libraries.

Once all that's in place, though... people can pick and choose what to get creative with ^_^ -- that's why I think having this sort of thing embedded directly into polymer-cli is reasonable. It'll still use npm's cache, be really fast, have offline support, etc etc.

justinfagnani commented 7 years ago

We wouldn't want to have our own package manager because that would be a barrier between Polymer and all the other front-end code out there. Polymer is just a library and has no special requirements outside of any other front-end package that uses web-compliant modules - that imports are done by path, which leads to needing a flat package layout.

If we did have our own package manager, what would a project with Angular + Polymer do? Use npm or polymer-pm? Same with React, Vue, vanilla JS, etc. What would happen if all the frameworks each had their own package manager?

Yarn at that point is a much better option because it's not specific to any particular library, and we may build up using flat as standard practice over time for web development.

@zkat I'm curious about this point:

--flat - npm will never support --flat. It essentially breaks the ecosystem contract and introduces dependency hell into a package manager designed to avoid it. Not to mention, as I said above, that this sort of thing would force even non-frontend dependencies to flatten when they don't need or want to. This just violates way too much of the package contract.

If flat were opt-in and done on a per-target or per-dependency basis, how would that break existing contracts? Right now flatening and deduplication must be done as a post-installation step, so the community is adhoc layering this contract ontop of what npm provides.

web_dependencies - Yup! That's what assets/ is -- the name is something that we've been talking about using for several years now, and I think it covers a wide enough range of what this could be used for. Essentially, this is the approach we're opting for.

I was hoping to learn more about the idea for assets/, since I've only heard a little bit though @notwaldorf so far. It sounded like assets/ would be flat, so does that clash with the point about not supporting flat installations above, or is there some other interpretation here?

Daniel15 commented 7 years ago

We wouldn't want to have our own package manager because that would be a barrier between Polymer and all the other front-end code out there.

I was going to say "I think @zkat's question was purely hypothetical; I don't think she was actually proposing a separate package manager" but then I read her entire comment and now I'm not so sure 😄

It installs all dependencies flat, to a predictable directory. That means that dependency hell is possible through conflicts.

Personally I'd love to see something where the installation fails in case of conflicts that can't be resolved. That's what pretty much every other major ecosystem from the past 20 years has chosen to do, even in cases where "the npm way" would have worked. I know that forcing a flat package list is not "the npm way", but I also don't want 10 different versions of left-pad in my app. I already use DedupePlugin with Webpack 1 or root-most-resolve-plugin with Webpack 2+ to dedupe multiple different versions of the same package.

Currently, package maintainers can just ignore unresolvable conflicts since npm happily allows them. If the problem were more obvious, I think more developers would be open to upgrading their dependencies more often (or having their projects forked to upgrade them, for unresponsive developers)

Snugug commented 7 years ago

@zkat I'm curious about

Assets specified in assets will ONLY be installed for the toplevel project. That is, if you add a dependency, and it specifies assets, they will be completely ignored, much like devDependencies, when installed as a dependency.

Is this to avoid dep hell? I would think if I were working on a plugin or component for a thing I'd want to declare what versions its compatible with, and flat resolution as you describe with attempting to install a dependencie's assets seems like it would do that. Otherwise, may need a peerAssets or something like that to declare and enforce that?

zkat commented 7 years ago

Word of Warning: this is textbook tl;dr

So this got really long and I'm sorry, and I hope you'll forgive me for dropping a text bomb on you. I think it's safe for at least some people here to skim most of this, and the rest is direct responses to people's questions, so just search for your name and you'll see my response to that, specifically. This topic involves a lot of detail and tricky corner cases, and since we're talking about the constraints we're working here, I think it's best to at least try to have this level of detail in the conversation.

The Rationale for Flattening

Before I get into it, I want to talk a bit about why we're having this conversation, and make sure that I understand the concerns involved. This is both for my benefit (in case I missed something), and for the sake of folks reading this thread. A lot of this is probably obvious to people, but I think by putting this into these terms, we'll be able to look at individual decisions and say "ok this solves X and Y but not Z" and such. Sorry in advance for it being so long, but I think we'll benefit from the detail:

  1. Bundle size - as webapps grow bigger and bigger, it's getting harder to make sure your frontend assets are a reasonable size. Mobile apps often take ~10s or more until time to interaction, and part of this is JS parsing+compiling, which essentially locks up the poor things. So -- if you want to deliver snappy webapps and you want to have mercy on people who are stuck on literal 2G, you want to have as many cool features as you want with as little data transferred as possible. Deduplication means that you don't have 10 different versions of lodash (sorry, I'm kinda tired of left-pad examples, and lodash is a realistic thing people run into on the reg). Note that currently, npm supports this mostly ok within the bounds of its semver constraints: as long as your dependencies are all semver-compatible with a single version of lodash, there's a pretty good chance you'll only have one copy of lodash in your entire tree. Or at least not 10 of them. YMMV depending on the project, and you can definitely get up to 10, and npm could definitely use a heavier algorithm to try and work out a flatter tree -- but it can't, and won't, install semver-incompatible packages for you in order to reduce your bundle size. Note also that bundle size is not the same as on-disk size. A tool like pnpm does an amazing job at reducing on-disk size, but its node_modules/ structure is such that a basic bundle will be as big as a fully-nested npm@2 tree.

  2. Path predictability - Browsers don't allow you to resolve import foo from 'bar' according to the Node module resolution algorithm. They need a concrete, special path, with a possible exception of foo/ and foo/index.html. Basically, if you install an asset, you need to be able to predict the full path to any files in that asset package, or you're forced to use a bundler that supports this sort of resolution for assets (and configure this). This is not just for JS sources, btw: If you have a package that is literally just css or images or whatnot, your app needs to be able to refer to it. Flattening helps with this because it frees itself from the overhead of trying to figure out wtf your package manager actually did when it installed a dependency's dependencies. It means that your package knows that its dependencies will always be accessible through import foo from '../my-dep', or <img src=/assets/my-dep/cat-snuggles.gif>.

  3. The DOM and Singletons - For a combination of historical reasons, and for the fact that the DOM is a single toplevel singleton that JavaScript is meant to side-effect, it's often necessary to make sure there is exactly one copy of a package across your entire app. I say historical because part of the reason this worked out the way it did was because of the frontend's lack of a module system of any sort, followed by a mixture of ad-hoc, incompatible module systems that started competing with each other later in the game. tl;dr you want one React object globally because it installs things into the DOM. Or at least you want a single $ globally, if you expect to interact with it at the toplevel. noConflict() is a thing that started getting used but it's honestly a pain and best to avoid. Deduplication helps with this because it is a guarantee that there will only be one copy of your package, at a global level. This is only rarely required in Node, because it's always had the CommonJS module system there, which has encouraged folks to stay away from globals. Exceptions to this are things like grunt and its plugins that really really want to be singletons. But there's existing ways to resolve this which don't involve deduplication already, so I consider this largely a frontend concern.

In summary:

  1. bundle size is not a reason for strict deduplication: "maximally deduplicated" is good enough here most of the time, because this isn't a binary concern. The idea is reduced bundle size.

  2. path predictability alone is not a reason for strict deduplication, but it is a reason to have either full nesting or full flattening. The current approach npm and Yarn take where "we try to flatten to a reasonable extent" is not workable for humans when it comes to figuring out where the hell that one dependency actually is. While this is not much of a problem for modern JS tooling that does module resolution at bundling time, it does present a significant problem for a framework like Polymer which relies entirely on client-side pathing to specific html and JS files.

  3. singletons definitely benefit from flattening, but alternative solutions, such as dependency injection, are often practical and workable enough. In the case of Polymer, though, this is a massive concern because custom elements are fully global, and last I played with Polymer, it had no allowances whatsoever for resolving conflicts like jQuery's noConflict() trickery.

Again for the folks in the back: Polymer absolutely requires 2 and 3, but 1 is a "nice-to-have" that happens to fall out of this. Therefore, I'd like to focus on ways to solve those two concerns, rather than treating tree flattening as the only solution to bundle size reduction (or, tbh, a significant one).

Project-specific Package Managers

We wouldn't want to have our own package manager because that would be a barrier between Polymer and all the other front-end code out there. ... What would happen if all the frameworks each had their own package manager? - @justinfagnani

The suggestion here is, of course, to have a standardized way that all of these can work -- the assets/ thing is something we're planning for our own package manager, and the suggestion to have something like this in Polymer was both a hypothetical example, as well as something I consider a legitimate possibility that would allow Polymer to simplify its own toolchain by putting everything needed in a single utility package. Of course, if anyone wanted to use npa for this, that would be fine and it would work just as well -- I was also trying to make the point that the building blocks for this are becoming available in such a way that having this tooling built-in does not involve a ton of work, specially from a compatibility PoV. I believe this also answers @Daniel15's question about this.

tl;dr it was an example and my answer was not premised on Polymer doing that.

Opt-in Flat Dependencies

If flat were opt-in and done on a per-target or per-dependency basis, how would that break existing contracts? - @justinfagnani

Keep in mind that I'm talking specifically about a command line option. I think your suggestion of a flat: true flag in package.json would not necessarily violate this contract, since it would be consistent regardless of CLI flags. The issue with --flat, basically, is that you have two completely different physical trees that are possible that require a user to pass in a flag. That means packages which have been developed expecting nest-on-conflict semantics of npm would break unexpectedly when the flag is used, and packages which have been developed assuming a flat structure will break if the flag is omitted. If you have both in the tree -- which is perfectly possible when you have 2500+ dependencies -- you're... basically SOL.

The Idea behind assets/

I was hoping to learn more about the idea for assets/ - @justinfagnani

Ok! I was hoping my description above clarified the intention a bit, but let me summarize it and hopefully I'll fill in any missing bits from what I said before:

Deduplication Concerns, and Crashing on Conflict

Personally I'd love to see something where the installation fails in case of conflicts that can't be resolved. That's what pretty much every other major ecosystem from the past 20 years has chosen to do, even in cases where "the npm way" would have worked. I know that forcing a flat package list is not "the npm way", but I also don't want 10 different versions of left-pad in my app. - @Daniel15

Let's talk a bit about what "the npm way" means here, from my perspective (as one of the devs for it):

npm is one of the few package managers out there designed to prevent users from falling into something called "dependency hell". Dependency hell is when you have two dependencies, and those dependencies both require the same dependency, but at two incompatible versions. For example, let's say you have A, which requires jquery@1.8, and B, which requires jquery@2.0. The way npm resolves this is by making it so both 1.8 and 2.0 are installed, and it uses the node module resolution algorithm such that if A does require('jquery'), it will get 1.8, and if B does require('jquery'), it gets 2.0. This is a nice solution because it happens transparently, without the user having to do anything to have their tree work.

npm is also one of a growing number of package managers that has placed a bet on semver as a standard to follow when it comes to version numbers. That means that, as far as npm is concerned, B should not ever use 1.8, because there would be no reasonable expectation that the way B uses jQuery would be remotely compatible with 1.8's API, since it declared 2.0 as the thing it expected.

Now, you mentioned "For the past 20 years". I'd argue that you should keep in mind that package managers have historically not worked at anywhere near the scale npm dependencies tend to. A modern webapp will easily pull in between 1.5k and 2.5k packages, or more. The risk of conflict resolution at this point is way higher than ecosystems that evolved being forced to do this kind of conflict resolution on the regular. Those will usually lean towards much smaller trees with bigger individual dependencies, 'cause they're easier to manage conflicts for. I'd argue that it's node's module resolution algorithm that really enabled the JS ecosystem to grow as big as it did. You just don't have to worry about this most of the time!

How does this all apply to your question, though?

Flattened dependencies, in this case, basically say "There can only be one jquery, ever." That means that we are now faced with a difficult choice: either A or B will have to use a version of jQuery it was not designed to support. And I don't just mean this as a casual theoretical thing. Semver-major changes are such that there is, in fact, a pretty good chance that if B loads jquery@1.8, it will fail to work at all. That's the semver contract. As far as npm is concerned, the moment the jQuery team tagged 2.0, it declared that A can not use its latest release.

So, what can we do about this, when we can't just do the nesting? (because of the reasons outlined in the beginning of this post)? I mean, we can just fail entirely, but that would basically tell the user "idk", and give them nothing to do but to start fiddling around with versions for their direct dependencies until something works, or force them to get rid of either A or B (if these are direct dependencies -- if not, GOTO 0 ("idk"). I think we can do a lot better than this...

So far, I believe a combination of two things is the best way to go for this:

  1. Have the package manager do its very best to flatten that tree - including upgrading or downgrading dependencies within a semver-compatible range, until it finds a combination of dependencies that works. This is what the aforementioned Molinillo does, which is the resolution algorithm used by both CocoaPods and Bundler. The Ruby world has been dealing with the flattening problem for a very long time, and very smart people have done very smart things to solve this. To be clear: an optimal algorithm for a maximally-flat dependency tree is an NP complete problem so the best we can do here is heuristic the shit out of it, and maybe light a few candles in our shrine of choice and do a little prayer. I mean, aside from porting Molinillo and sharing that effort with the wonderful CocoaPods and Bundler devs ;). As a note, this approach differs significantly from the way npm does flattening, which is a much simpler algorithm (and, presumably, faster): Given a semver range for a dependency, npm picks the latest version of that dependency. If it runs into that dependency name again, it checks to see if the latest version it picked before is compatible with the range for this one, and if so, it flattens it. If not, it nests it. This is easy to code, but also less likely to give you a fully-flat tree.

  2. When push comes to shove, and Molinillo fails to find a flat tree with the given dependencies -- a case you'll run into if you have the basic A and B example above, where neither dep has an overlapping version compatible with the same jQuery version -- then put it on the user to decide whether they're Feeling Lucky™ and see if they want to try to pick one of the given versions (or a specific one), and hope that both A and B work "well enough" in their specific case. This is risky and puts a lot of responsibility on the user, but I don't particularly like it, but... the alternative is to give the user the finger, which I really would rather not do if I can help it.

Now, there's another bit here that I think we might be able to add... @Daniel15 mentioned leaving it up to users to upgrade their dependencies on the regular. I think this is a bit optimistic when talking about an ecosystem with 550k packages and growing. Legacy stuff is just plain gonna be there, and sometimes you're gonna wanted. Also, bleeding edge stuff is gonna happen, and an ecosystem that big will not move that quickly. But here's the thing: Molinillo already simulates this happening at a certain point in time! By trying to find a maximally-compatible flat tree, you're essentially time traveling into the past until you happen upon a package combination where everything would work.

I think there's one more point of user interaction that we might be able to add to prevent us from having to do #2 above, which I consider kind of a nightmare scenario: I wonder if Molinillo can be adapted to suggest to the user a different version range for A or B that would work, before it asks the user to pick the incompatible jquery version to use. Again, though, this is an NP-complete problem so I have no idea if this is actually practical, and I haven't ported Molinillo yet. Perhaps a ping to @segiddins in case he has any ideas on this front.

To clarify what I mean: Let's say your project depends on A@^1 and B@^2. You got those versions just because you did npa i A B, not necessarily because you want B@2 specifically -- at least that's usually the case for people!

What we can do in this case is tell the user "Your tree will work if you downgrade B to B@^1, because it depends on jquery@1.8. Would you like to downgrade this dependency?"

And then you'd have a valid tree.

And everything gets recorded into your lockfile, and tada, you're all done.

Why we shouldn't install transitive assets: {}

Is this to avoid dep hell? I would think if I were working on a plugin or component for a thing I'd want to declare what versions its compatible with, and flat resolution as you describe with attempting to install a dependencie's assets seems like it would do that. Otherwise, may need a peerAssets or something like that to declare and enforce that? - @Snugug

Not quite: making it so recursive asset: {} entries don't install is to make sure we have one way of declaring dependencies, and dependencies: {} is what the ecosystem is already using. That is: all currently released versions of jquery, lodash, async, etc, are using dependencies for their own dependencies. So we should stick to that. The reason for having assets: {} is for allowing webapps, at the top level, to put their frontend dependencies in a separate space for serving. Libraries should not use assets: {} for things they need. They should just assume their dependencies will be siblings.

@iarna and I talked a bit about making assets: {} install nested, which is something I wanted to do, but then Rebecca pointed out that this would defeat the whole purpose of having this in the first place. If you have a library, and that library requires lodash as an asset... now you've got a version of lodash that can never be flattened/deduplicated, because the contract for assets: {} is that they will always be installed to a directory called assets/ in the current project.

So we figured it's best to just ignore them when nested, for libraries.

Conclusion

To summarize some of the points made, related to questions:

  1. We're doing this because of path predictability and singleton issues, primarily, and bundle size concerns are a nice-to-have that falls out of this naturally (when talking about Polymer's needs, at least).
  2. It's fine for individual tools to integrate their own package managers, because we'll all be working off the same spec, and often the exact same underlying tools and libraries.
  3. --flat is a bad idea because it generates trees that are potentially incompatible with the existing contracts for 550k+ packages (due to conflict resolution).
  4. Dependency hell is hell and "Just Crash" is a bad user experience, so we should use a combination of a smarter resolution algorithm, and user decisions about their own fates, since they know their app. These decisions are best made when presented with useful information by the tooling.
  5. npm will never guarantee flat dependencies. Ever.
  6. The assets/ directory is special and only exists at the top level.
  7. flat: true in package.json is probably fine enough, but I think we have something better here.
mikesherov commented 7 years ago

Since I was pinged, I'll offer my opinion!

Abstract

Simply stated, bundle size and where things live on disk is really a bundler/resolver concern, not a package manager concern.

How did we get here?

The only reason we even think it's a pm concern is because both npm and yarn are built by default to know about Node's resolver. And they are only built this way because almost packages on npm are node programs.

As a consequence, we have Browserify, Webpack, etc. And they have adopted the node resolver algo as the default way of bundling/resolving.

As stated before, this breaks fragile singletons (like jQuery etc). However, this is the ultimate minority case. Yes, jQuery use is widespread, but not specified as a dep in many packages. And in most cases, peerDeps solve this problem. But, we still need to solve this.

Solution: Resolution / Bundling is a bundler concern.

I personally have interconnected personal/company projects and have seen this problem. I have two DOM libs, both of which express deps on seemingly incompatible versions of jQuery that I then have as deps of my main application. So, what do I do?

Well, seeing as I know I need jQuery 1.x in my application, I express that exact version as a dependency of my application, which gaurantees it'll be installed at node_modules/jQuery. Then, considering I need that file to get into my bundle, I add the following to my Webpack config: alias: { jquery:node_modules\jquery} which guarantees only that version of jQuery is bundled. Now, because I know what I'm doing, and I know that jQuery will work despite incompatible deps being specified, I have resolved the issue in the "correct" spots: in my pm I have specified my dep, and in my bundler, I have specified how to use that dep.

Now, this may seem less than ideal, having to configure both Webpack and package.json to achieve this, but the alternatives have footguns:

  1. Alternative 1: have a flat: true: This may seem really appealing, however, it violates the underlying principle that the JS ecosystem has grown accustomed to: you can have separate versions of dependency. The only thing that comes close here is what bower did with resolutions. That is, it allowed the user to specify how to resolve a conflict for a specific dep by saying "no no no, this is the version of jQuery we need". But, IMO this didn't go far enough. We'd need to specify which versions of jQuery from which dependencies map to the version we specify as an override. This is because one of your deps could major bump jQuery and you'd not know there is a specific issue you'd need to address by overriding. It gets complicated. And guess what, Webpack and other Bindler's already give you the tools you need here!

  2. Alternative 2: provide an only: true flag. That is, if jQuery needs to be by itself in a bundle, it should have a way to say that! The thought would be that jQuery would specify that it's a fragile singleton and then npm / Yarn / etc would know to warn when more than one copy would get installed. Problem here is this still doesn't solve what version should be used. Latest? Earliest? There needs to be a consumer level way to solve this. That is, this is an application concern, not a lib concern!

Conclusion

Let bundlers bundle, let pms install packages. Ok, now I'm going to stop talking before Sean Larkin's Webpack radar catches this convo.

sdboyer commented 7 years ago

Alternative 2: provide an only: true flag.

i wanna absorb and share more thoughts on this later, but i'll drop a quick note now to say that i think this is a promising approach

justinfagnani commented 7 years ago

I don't have time yet for a full response to the previous books :) but I did want to drop a quick comment on bundling being the solution to flat/resolving:

We strongly believe that bundling should be an optional optimization used for production, but not necessary during development. Polymer-only projects have always worked directly from an installation with no build step.

Which reminds me of another feature that I'd like to have out of a package manager (which I know isn't going to go over well already, but hey): name resolution on installation

The names of modules are specific to the package manager their published to. To me it makes sense that a web package manager do name resolution on installation so that the installed package folder is loadable by browsers.

Also, it's a bit of begging question to claim that we can't have flat installs because we don't have flat installs. There are other JS package managers that install flat, like Bower, and while npm "won", I don't think we can say it's because of nested packages vs other network effects. Nested packages also has much less negative implications on node than the web and npm is still a node-focused package manager, so multiple version being common on the web may just be an historical artifact of how web project migrated to where there were a lot of packages.

mikesherov commented 7 years ago

We strongly believe that bundling should be an optional optimization used for production, but not necessary during development. Polymer-only projects have always worked directly from an installation with no build step.

Then Polymer should have a package manager specific to it's resolution algorithm. You're saying there's no build step, but package install is a build step. If you want your package manager to also do name resolution, what it sounds like to me is that you mean you want a postinstall script to run that handles the specific "flat" resolution algorithm.

so multiple version being common on the web may just be an historical artifact of how web project migrated to where there were a lot of packages.

I don't personally view multiple versions to be an artifact. IMO, multiple versions is a strength of the javascript ecosystem that specifically avoids dependency hell, and I consider bundle size to be production optimization and "I need to be the only copy of jquery" to be the true artifact.

cdata commented 7 years ago

@mikesherov

Then Polymer should have a package manager specific to it's resolution algorithm.

It should be clarified that this is not Polymer's resolution algorithm. This is driven by how web browsers resolve paths almost universally.

You're saying there's no build step, but package install is a build step.

It's fair to say that installing a project and its dependencies counts as a build step. For the purposes of discussion, it's probably best to distinguish the broader concept of "build" step from source transformations. In typical cases, Polymer projects can be installed and run in dev without source transformations because they rely on the package resolving affordances that are readily available in a browser.

multiple versions is a strength of the javascript ecosystem that specifically avoids dependency hell, and I consider bundle size to be production optimization

Here we should distinguish two hells: dependency hell and version resolution hell. NPM has successfully avoided version resolution hell, but with a notable set of tradeoffs that magnify dependency hell. In many environments it is considered virtuous to require one canonical version of each package.

mikesherov commented 7 years ago

It should be clarified that this is not Polymer's resolution algorithm. This is driven by how web browsers resolve paths almost universally.

I thought this was about named module resolution (as opposed to path resolution), right? Excuse me if I an mistaken, but require('abc') and require('./abc') are different things, as are import('abc') and import('./abc'). The notion that import('abc') === import('abc') seems not universal. Am I mistaken?

justinfagnani commented 7 years ago

Yes, I was talking about npm doing node module resolution on installation, so that installed packages are ready to be consumed by the browser, which obviously doesn't do node module resolution.

I think this is within the realm of responsibility of a package manager, since the names themselves are specific to the package manager.

The transform would be from import 'abc'; to import '../abc/index.js', or wherever 'abc' resolves to.

rksm commented 7 years ago

If it's of any interest, I build flatn a while ago, especially for the purpose of using npm packages in browser, for bundling it in a sane way with electron and in combination with SystemJS and lively.modules.

flatn can use one or multiple "package collection directories", those hold the dependency packages in a package-name/version structure. It then knows about "package directories" and "package development directories" that directly point to packages.

It indexes all of that and when resolving a name from one package to another, it figures out the right version and path and pulls the dependency in. This has the advantage that even though the dir structure is flat, multiple versions of the same package for different deps still work. The index can be updated at runtime if new packages should dynamically be added.

flatn itself is currently nodejs only, it's output is understood, however, by lively.modules for in-browser usage. When using it to export a package index, general in-browser usage should be possible.

Currently flatn doesn't work on windows and there are other things that can be improved. It's on my long-term plan to work on that but in case there is a wider interest in it I can focus on that more.

mikesherov commented 7 years ago

I think this is within the realm of responsibility of a package manager, since the names themselves are specific to the package manager.

The names are specific to the registry and the resolver, not the pm. Yarn and npm share names if they are both using the npm registry. Also, I can make names mean something different by tweaking my resolver (Webpack).

The transform would be from import 'abc'; to import '../abc/index.js', or wherever 'abc' resolves to.

"resolves to" according to the node algo specifically. This sounds exactly like what a bundler/resolver does. You just want it to run postinstall rather than perpetually. As soon as you're inspecting package.json for a "main" entry, youve gone away from the "package resolving affordances provided by the browser", no? I may be missing a key understanding about browsers ability to resolve names. Am I?

without source transformations

I'm unsure why directory structure transformations are preferred to source transformation.

With all that said, if we end up with a flattening directive in package.json, IMO, it should satisfy several requirements:

  1. allow packages to declare themselves as Highlanders (there can be only one), so that application consumers get failed installs if they don't resolve the underlying footgun even if they don't specify they want a flat install. Fragile singletons are fragile singletons whether you say so or not!

  2. If a consumer wants a flat tree, they should have to specify what ranges from what conflicts they want to cover. That is, if I have depA which requires jQuery 1.x and depB which requires jQuery 3.x, in my apps package.json, I should have:

    resolutions: {
    "jquery": {
    "version": "^3.0.0",
    "covers": {
      "depA": "^1.0.0",
    }
    }
    }

This way, if I introduce depC, which requires jQuery 2.x, I get a warning, because I'm unsure whether depC really can use jQuery 3.x. Also, if depA updates to jQuery 2.x, I don't know whether it also works with 3.x. So the resolutions field needs to know which packages it conflict resolved and for which version ranges.

  1. The flattening algo shouldn't assume anything when a conflict arises. You can have conveniences like --use-latest, but that'll just write the correct info from the second req above automatically. Point is, package.json should be the source of truth for conflict resolution.
justinfagnani commented 7 years ago

@mikesherov

I think this is within the realm of responsibility of a package manager, since the names themselves are specific to the package manager.

The names are specific to the registry and the resolver, not the pm.

We're having terminology conflict I guess. By package manager I mean the complete system. i'll try to use "registry" from now on.

The transform would be from import 'abc'; to import '../abc/index.js', or wherever 'abc' resolves to.

"resolves to" according to the node algo specifically.

Yes, because this is the resolution algorithm that names in npm packages assume right now.

This sounds exactly like what a bundler/resolver does. You just want it to run postinstall rather than perpetually. As soon as you're inspecting package.json for a "main" entry, youve gone away from the "package resolving affordances provided by the browser", no? I may be missing a key understanding about browsers ability to resolve names. Am I?

The browser has no ability to resolve names - it only loads by path. This is why I want a package manager for the web to resolve on install, so that projects are loadable by browsers out-of-the-box after install.

I'm unsure why directory structure transformations are preferred to source transformation.

Not following. I don't think I suggested directory structure transformations, just rewriting import specifiers to be web-compatible by resolving names.

With all that said, if we end up with a flattening directive in package.json, IMO, it should satisfy several requirements:

Cool, thanks for thinking this through!

allow packages to declare themselves as Highlanders

We're not just concerned with "highlanders", but with packages that references other packages by path. In that case you're not necessarily saying that the package needs to be a singleton, but that its dependencies need to be installed to known locations (siblings being the simplest known location). For instance, all of Polymer's packages published to npm now are web-compatible out-of-the-box, and import their dependencies with ../other-package/file.js. example

If a consumer wants a flat tree, they should have to specify what ranges from what conflicts they want to cover.

This sounds great. It would be nice if Yarn's resolutions worked this way.

The flattening algo shouldn't assume anything when a conflict arises.

Agreed

So back to this:

if we end up with a flattening directive in package.json

Is there anything we can do to help this along, anything from more motivating use-cases, gathering other interested parties, to defining semantics, or contributing code? I would love to not have to be tied to Yarn, and maybe get some convergence for web-compatible installs.

mikesherov commented 7 years ago

Thanks for clarifying. However I'm still confused because it seems like different folks are asking for different things. Let me ask you specifically @justinfagnani, which of the two ways you want this to work. Ignoring the conceptual split between a bundler/resolver and a pm, you could have a program that either:

  1. Allows you to author and publish imports using names import 'polymer';, and then on consumer install, have the source be transformed to import '../polymer/polymer.js or e.g. import '../../../polymer/polymer.js' if it's nested. The important bit here being that the directory structure is irrelevant. The program uses the node resolution algo to find the path and in lines it. Now, you could say "no, the directory structure needs to be flat here" but I'm not sure why that's a requirement.

  2. Requires you to author `import '../polymer/polymer.js', even though you specify 'polymer' in package.json as your dep, you've manually resolved the dep. on consumer install, we force the dir structure to be flat because the handcrafted imports dictate that level of knowledge of where things are on disk.

IMO, the first choice is clearly superior, and you could always layer on aggressive flattening at the rewriter level... almost no connection to what's on disk nor represents what actually gets shipped to the user.

Please let me know where the above falls apart for you so I can empathize better. Thanks!

iarna commented 7 years ago

This is my take:

Node.js modules and Web modules (for lack of a better name) are essentially two different languages. They have different resolution semantics, different language capabilities and should be installed in different locations.

Node.js modules are installed in node_modules with nesting in accordance to the Node.js module resolution algorithm. They can make assumptions about the language features that Node.js implements and the modules it provides. It has (currently) CJS module loading only. (But support for ESM won't change anything.)

Web modules are installed in assets (or some other name, this is arbitrary) and are installed in deterministic locations relative to assets and are required via relative paths (via ESM or CSS import or HTML script and link tags).

Web modules and Node.js modules are best thought of as different languages.

They can't interoperate without software assistance (Node.js modules would need webpack-type help), they have different mechanisms for loading dependencies, etc.

graynorton commented 7 years ago

@iarna, where does this leave the class of modules that are useful in both node and browser contexts, or the increasingly common isomorphic/universal JavaScript approach to rendering web app UI?

iarna commented 7 years ago

@graynorton The differences in package loader semantics means that they can't actually be cross-loadable if they have dependencies, without a build step. If you have a build step you can pretty much do what you want. I mean, with build steps we have CJS in the browser today.

If the ESM loader spec is ever finished and implemented then that'll be the right solution, but without that I don't see a path forward. (The Node.js ecosystem's behavior is pretty well locked in and no one is likely to change it in the foreseeable future. 500,000 modules is a LOT of inertia.)

ChadKillingsworth commented 7 years ago

One thing to keep in mind: bower resolutions quickly became an unmaintainable mess on large projects. I believe lock files should massively alleviate some of that pain, but whatever solution is chosen should definitely consider the maintenance burden of resolutions.

mikesherov commented 7 years ago

@ChadKillingsworth, CLI args like --use-latestetc. should help alleviate some of the maintenance burden around resolutions. Lock file can't really help here, IIUC, because that is untouched by humans. That is, resolutions go in package.json because a human verifies the application works with the given resolutions and the lock file is just a snapshot of how those resolutions play out.

Does that make sense, or did I miss something?

mikesherov commented 7 years ago

Or you can imagine npm resolve jquery@1.2.0 or npm install jQuery@1.2.0 --force and imagine npm writes the resolutions for you into package.json.

mikesherov commented 7 years ago

@justinfagnani, btw, I'm definitely interested in solving this problem in npm despite the fact that I push back hard. It helps define the problem for me.

ChadKillingsworth commented 7 years ago

@mikesherov Yes that makes sense - but doesn't alleviate the concern. My lock file comment was more a nod to storing the resolved version in the lock file instead of in the equivalent of bower.json.

With bower, I saw a lot of CI builds with the --force-latest flag just to bypass the resolution issue, but the I would see builds start breaking without local changes. That and then the local resolutions became pretty meaningless.

I too am extremely interested in solving the problem. Any valid solution to me would require:

mikesherov commented 7 years ago

In npm's case, --force-latest would write to resolutions in package.json, not ignore it, so it wouldn't be possible to do this in CI.

Also, i personally would not imagine them to be transitive, and to be similar to a lockfile in that regard in that they are really only used for applications, not libs. Libs would use liberal semver ranges to express greater tolerance and flexibility, whereas as resolutions are for "leave me alone, I know what I'm doing".

Unsure what you mean by maintainable resolutions. Can you elaborate?

mikesherov commented 7 years ago

BTW, it's worth clarifying that I'm not speaking on behalf of the npm team (despite saying things like "in npm's case").

ChadKillingsworth commented 7 years ago

In npm's case, --force-latest would write to resolutions in package.json, not ignore it, so it wouldn't be possible to do this in CI.

Yeah bower did that too - but I still saw CI builds use the --force-latest flag. Sometimes in a .bowerrc file. It was a pretty terrible thing to do and a huge foot gun.

Libs would use liberal semver ranges to express greater tolerance and flexibility

In an ideal world, I agree. But I rarely see that level of care and concern to the library versions so in practice this didn't actually work out in my experiences.

mikesherov commented 7 years ago

So I understand...

Sometimes in a .bowerrc file.

This would leave the codebase in a dirty state, right?

in an ideal world, I agree. But I rarely see that level of care and concern to the library versions so in practice this didn't actually work out in my experiences.

Me too, I suppose that's why I like Node's version of nested dependencies ;-). I'm really not sure what we can do besides descriptive and prescriptive override logic to avoid this.

@chadkillingsworth, are you in agreement that resolutions ought be application level, like package-lock, rather than transitive, like shrinkwrap? Yes, in practice, lib authors often aggressively version bump, and only express the latest version they work with, but not the earliest (e.g. ^2.0.0 vs. >=1.0.0 <3.0.0, but these same library authors will then not explicitly care to fill in resolutions because they could just do that less aggressive version ranges if they cared.

Also, starting without transitive resolutions allows us to add them later if necessary. I suspect it's YAGNI, but open to be proven wrong :-)

ChadKillingsworth commented 7 years ago

This would leave the codebase in a dirty state, right?

Yes it did.

why I like Node's version of nested dependencies

Which is horrible for the front-end world though - but perhaps unavoidable completely.

are you in agreement that resolutions ought be application level, like package-lock, rather than transitive, like shrinkwrap

Absolutely. I also wonder if a resolution should indicate "latest" vs locked to an older version. In the majority of cases of a conflict, you want the latest version (at least that's been my experience). If somehow that case could be optimized so it was easier to maintain, that might make all the difference.

justinfagnani commented 7 years ago

@justinfagnani, btw, I'm definitely interested in solving this problem in npm despite the fact that I push back hard. It helps define the problem for me.

@mikesherov no problem at all! I still feel like we're getting there aon common problem and terminology understanding. We can continue to explain our POVs and use-cases.

@iarna:

Web modules are installed in assets (or some other name, this is arbitrary) and are installed in deterministic locations relative to assets and are required via relative paths (via ESM or CSS import or HTML script and link tags).

Sound on the right track, and certainly how we'd use it, but we definitely can't forbid required build steps. I'm sure there's going to be web modules that require at least node module resolution on bare specifiers, and some amount of friction until paths vs names shakes out in browsers.

Libs would use liberal semver ranges to express greater tolerance and flexibility

In an ideal world, I agree. But I rarely see that level of care and concern to the library versions so in practice this didn't actually work out in my experiences.

There are some things a package manager can do to help here. In Dart's Pub package manager we got in a "downgrade" command which ran the version constraint solver with the ranges inverted to prefer the oldest of every dependency. This let you test your package against the lower bounds of your constraints and help encourage not blindly bumper lower bounds when upgrading a dependency.

Also, in the Polymer CLI we added a polymer install --variants command which reads extra stanzas in bower.json with different sets of dependencies so that you could develop and test against different concrete solutions to constraints. Variants are tested in CI as well. This helped us ensure that our elements worked with Polymer 1.x and 2.0.

mikesherov commented 7 years ago

Right, so one thing I want to get nailed down is the mechanics first, and then there is a ton of UX we can layer on on top, whether it be downgrade or --use-latest or whatever commands eventually solve the constraints. I'd imagine a web version of greenkeeper to be similar to Polymers constraint solver. But all of that is sugar (important sugar) on top of a solid base. I'd love some comments on the proposed mechanics:

  1. flat is specified in package.json, not in cli. That is, flatness is described for the app, not the runner of the cli.
  2. Resolutions go in package.json as described above in https://github.com/package-community/discussions/issues/2#issuecomment-329314103
  3. Resolutions are app level and not transitive.
  4. Resolutions only apply if flat is specified in package.json. That is, an app is either flat or it isn't. Resolutions aren't used for partial flatness or generating a smaller tree. (Although this can be layered on as a bonus, but is not assumed to work).

If we can agree on the above, everything else seems like gravy.

iarna commented 7 years ago

I'm not actually very fond of flat as I think node_modules should have Node.js semantics (and flat does not), thus my suggestion of a new dependency type and installation area.

mikesherov commented 7 years ago

i see. Same structure as dependencies, but called "assets" (or whatever)?

iarna commented 7 years ago

Yes, substantively the same structure, with the addition that we need to have some facility for recording conflict resolutions. Here's an open question regarding that: Should those conflict resolutions go in the package.json (and be inherited by consumers of an asset) or should they go in package-lock.json (and only be used in the top level project). (I'm leaning toward the latter.)

ChadKillingsworth commented 7 years ago

I strongly believe resolutions should be in the package-lock.json and only be used in the top level project.

mikesherov commented 7 years ago

Resolutions only in the top level but firmly believe they belong in package.json because they'll express semver ranges "here's what I'll tolerate", where package-lock should essentially remain an artifact with nothing to interpret "here's what I tolerated"

mikesherov commented 7 years ago

Also have to remember that package-lock isn't about correctness, it's about reproducibility. If resolutions are required for assets, they really can't go in the lock file, imo.

jsilvermist commented 7 years ago

package.json would be where I would expect to find resolutions, not in package-lock.json. By that point shouldn't the specific versions already be decided anyways?

ChadKillingsworth commented 7 years ago

they belong in package.json because they'll express semver ranges "here's what I'll tolerate", where package-lock should essentially remain an artifact with nothing to interpret "here's what I tolerated"

In practice these concerns become one and the same. Every time you regenerate a lock file or update a dependency, you should be revisiting the resolution decisions.

Resolutions are used to decide both what to install now and to also for that choice to be repeatable.

Another problem with bower was that there was a built in rot to resolutions. Namely as a main dependency version was updated, there was no way to know that a specific transitive resolution it needed also should be updated. So on large projects it became routine to just delete all the resolutions and walk through the cli choices again.

mikesherov commented 7 years ago

Right, so if it's all the same, I think it needs to be in package.json especially considering you could disable package-lock generation if you'd like.

Also, the resolutions scheme suggested above would 100% have to specify which subdeps (including transitive ones) its overriding and for which range. For folks who are daring, it'd be:

resolutions: {
  jquery: {
    version: '2.0.0',
    covers: {
      '*': '*'
    }
}

For those slightly less daring, but still very daring:

covers: {
  subdepA: '*',
  'subdepB/assets/supdepC': '*'
}

Slightly less daring still:

covers: {
  subdepA: '^1.0.0'
  'subdepB/assets/subdepC': '^3.0.0'
}

Least daring:

covers: {
  subdepA: '1.2.3',
  'subdepB/assets/subdepC': '3.0.0'
}

Does having this level of control in package.json over overrides allay that fear, @chadkillingsworth ?

bmeck commented 7 years ago

Could use of symlinks work for all of this instead of making a whole new resolution algorithm/configuration?

I agree very much with @ChadKillingsworth that precomputing the resolution is problematic from making rot built in. Making a dir structure like pnpm does where the lookup can be rewritten instead of source code seems much safer and less prone to causing cache misses / rot over time.

justinfagnani commented 7 years ago

Symlinks:

  1. Aren't reliable on Windows
  2. Don't work unless you prescribe a package layout that separates dependencies from importable files in order to prevent cycles in the filesystem.

If you have two packages A & B that depend on each other, and you link A/web_modules/B -> B, and B/web_modules/A -> A, then you have a cycle which causes all kinds of problems, including infinite recursion with some symlink unaware tools. The only way to solve this is to require that importable files live in a subdirectory next to web_modules/, like lib/, then link from A/web_modules/B -> B/lib, and B/web_modules/A -> A/lib.

bmeck commented 7 years ago

Aren't reliable on Windows

You can use Junctions if you don't have perms. Symlinks do require elevated perms but are reliable if you have perms.

Don't work unless you prescribe a package layout that separates dependencies from importable files in order to prevent cycles in the filesystem.

Cycles are only problematic if you don't realpath when you resolve.

If you have two packages A & B that depend on each other, and you link A/web_modules/B -> B, and B/web_modules/A -> A, then you have a cycle which causes all kinds of problems, including infinite recursion with some symlink unaware tools. The only way to solve this is to require that importable files live in a subdirectory next to web_modules/, like lib/, then link from A/web_modules/B -> B/lib, and B/web_modules/A -> A/lib.

My assumption is that we are talking about tools for these resolution algorithms in particular and we can just say that they need to realpath instead of making that complex behavior required.

bmeck commented 7 years ago

I should note that HTTP redirects cause new Module records, so realpathing is definitely a thing you want if symlinks are supported at all.

justinfagnani commented 7 years ago

You can't control what tools run on the filesystem, and sending a redirect for most module requests is not going to perform well at all. If redirects are the solution, you don't need symlinks, but they're not the solution.

bmeck commented 7 years ago

@justinfagnani redirects make new module records, you almost never want redirects. You want the resolution to point directly to the realpath destination.

mikesherov commented 7 years ago

I'm not sure I understand why symlinks/redirects/etc are relevant to the convo? Wether it's real directory structure or symlinks or redirects, we're still talking about a flat apparent structure in regards to "path predictability" and "one version only", right?

Daniel15 commented 7 years ago

Symlinks do require elevated perms but are reliable if you have perms.

Elevated permissions are no longer required, as long as "developer mode" is enabled and you're on the Windows 10 Creators Update or newer. https://blogs.windows.com/buildingapps/2016/12/02/symlinks-windows-10/