nodejs / node-eps

Node.js Enhancement Proposals for discussion on future API additions/changes to Node core
442 stars 66 forks source link

Improving support for symlinks in the module system #46

Closed jasnell closed 6 years ago

jasnell commented 7 years ago

Continuation of the discussions https://github.com/nodejs/node/pull/10132 and https://github.com/nodejs/node/issues/10107 ... Please use this thread to discuss the high level architectural, merits, and long term impact of the proposed change so that the PR discussion can be focused on the actual code review.

isaacs commented 7 years ago

@phestermcs Please stop trying to change the process. I'm literally in the middle of writing a response to the issues you brought up in the PR right now. Let's do things the way that the Node project does things, respecting the established process, and try to follow the lead set out by the project maintainers.

ghost commented 7 years ago

I'm attempting to offer solutions for the following when using symlinks:

mikeal commented 7 years ago

Pinging @bmeck. We should consider the impending ESM support and any impact module system changes might have with that in mind.

isaacs commented 7 years ago

Re https://github.com/nodejs/node/pull/10132#issuecomment-265212357

@isaacs You're link to a proposed solution seems to be addressing peer dependencies, and is not clear as to how it would address the issue you state it would.

I see. So, to make sure we're on the same page, let me try to describe the problem, and y'all can tell me if I'm missing anything.

Suppose two apps with the same logical dependency requirements:

one
├── baz@1.2.5
└── foo@1.2.3
two
├── baz@1.2.5
└── foo@1.2.3

Let's further stipulate that foo depends on bar@1.x and baz depends on bar@9.x.

foo@1.2.3
└── bar@1.x
baz@1.2.5
└── bar@9.x

We install in one, and get this logical result:

one
├── baz@1.2.5
│   └── bar@9.0.0
└── foo@1.2.3
    └── bar@1.2.5

Some time later, after some updates have been published to bar, we install in two and get this result instead:

two
├── baz@1.2.5
│   └── bar@9.8.7
└── foo@1.2.3
    └── bar@1.2.5

If we use an approach like pnpm and ied, where every instance of foo@1.2.3 and baz@1.2.5 are symlinked into a shared store, and in that shared store, they each have their deps linked into their appropriate node_modules folders, then when we install two, it performs some unexpected action at a distance, updating one's meta-dependency because it updates baz@1.2.5 in the module store.

So this arrangement would thus be impossible:

one
├── baz@1.2.5
│   └── bar@9.0.0  <-- because this...
└── foo@1.2.3
    └── bar@1.2.5
two
├── baz@1.2.5
│   └── bar@9.8.7 <-- conflicts with this
└── foo@1.2.3
    └── bar@1.2.5

This is of course trivial to do without symlinking into a store, because you just have multiple copies of stuff. This is the "disk is cheap" approach. It works pretty well today, because disk is pretty cheap, but it does incur some less obvious costs like building packages, downloading multiple times, etc. There are some other ways to mitigate those costs, of course, and they should be explored separately.

However, in this challenge, we want to say that there's exactly one copy on disk of baz@1.2.5 package contents. The "disk is expensive" model.

Furthermore, in any satisfying solution, (A) there must be a logically correct dependency graph loaded at run time (ie, everything gets a copy of what it depends on, meaning that there are 2 copies of bar in the runtime no matter what, to resolve the dependency conflict), and (B) the two apps have different versions of bar@9.x served to their respective (linked) copies of baz. (If we just did the "resolve deps based on symlink destination as well as target" approach, we'd have a problem because we can't put both versions of baz in one/node_modules or two/node_modules.)

If I'm not understanding the nature of the logical puzzle, please let's stop and clarify. I don't want to rush to solutions until the problem is very well understood.

bmeck commented 7 years ago

No known problems with ESM on this front. If we were talking archive formats that serialize symlinks like ASAR / NODA / WebPackage we might have things to think about, but none are currently supported by Node.

isaacs commented 7 years ago

Now that we have established the technical challenge and agreed on the terms of it, I'd like to seek alignment on the heuristics to be used in evaluating potential solutions.

Here's a sketch of a list of what's important to me, personally, in ranked priority order. This is a question of values, and it's necessarily somewhat subjective, so I'd love to get some input from others in the nodejs core maintainer group about what the project's priorities are.

  1. It must be backwards compatible with current node, npm, ied, and pnpm use cases. If the only way to fix this problem is to break our current ecosystem, then we just have to live with the problem. We must not up-end our community, no matter what.
  2. It must satisfy the conditions of the challenge. (Of course, or else it's not a solution.)
  3. It must add a minimum of new surface area to the module loading architecture. (If there is a solution that requires no change to the node module loader, then that would be ideal here, but that might not be possible. Less change is better, in general.)
  4. It must not cause a significant slowdown in the Node.js startup time. (For example, searching 3 extra module locations is almost certainly fine. 300 would likely be a problem.)
  5. It should ask a minimum of additional complexity on the part of module architecture schemes seeking to solve this sort of problem. This is the lowest priority, because once we show that it is possible, the implementation is just a matter of typing.

Does this seem like a fair set of heuristics for judging solutions? I realize that these will be somewhat in conflict with one another, but that's why they call it a trade-off :)

If it seems that I'm being overly methodical and pedantic, it is because any change to the module system demands extreme diligence. Last time there was a disruption here, I promised that I would do everything in my power to help avoid it in the future.

isaacs commented 7 years ago

Or is it your intent that this exceptional change not be handled in this exceptional way?

Hah, my intent is that we make any change as un-exceptional as possible ;)

ghost commented 7 years ago

Regarding the condition of the challenge, I think it should also include a way out of symdir cycles when representing module dependency cycles.

isaacs commented 7 years ago

@phestermcs Can you elaborate on what you mean by "a way out of symdir cycles"?

isaacs commented 7 years ago

This is an example where the modules have a cycle, symlinks to those modules are still being used, but themselves dont create a directory cycle, which can cause infinite loops in some forms of tools.

So, is it a case like this?

A -> B B -> C C -> A

So, my app depends on A, A depends on B, B depends on C, and C depends on A.

You want to avoid situations where I can end up with a path like myapp/node_modules/A/node_modules/B/node_modules/C/node_modules/A/... because that makes some other file-viewing tool like a text editor or build script or whatever freak out.

Am I understanding it properly? If so, that seems like a fair restriction to me.

ghost commented 7 years ago

Correct. Fwiw, Sticking with subordinate /node_modules would prevent being able to meet this constraint.

isaacs commented 7 years ago

Yes, sticking with the somewhat naive "link everything into the store" approach, where things in the store have a node_modules folder with other links to things in the store, fails this approach and also fails the other requirements of a solution.

isaacs commented 7 years ago

@phestermcs I mean you have a folder structure like this:

├── app
│   └── 2.2.2
│       ├── index.js
│       └── node_modules
│           ├── baz -> ../../../baz/1.2.5
│           └── foo -> ../../../foo/1.2.3
├── bar
│   ├── 2.3.4
│   │   └── index.js
│   └── 9.8.7
│       └── index.js
├── baz
│   └── 1.2.5
│       ├── index.js
│       └── node_modules
│           └── bar -> ../../../bar/9.8.7
└── foo
    └── 1.2.3
        ├── index.js
        └── node_modules
            └── bar -> ../../../bar/2.3.4

That won't work because updating a second app updates the bar module under foo's node_modules folder in the shared store, and also if bar depends on foo, you get a symlink cycle.

This is the naive "use symlinks into a central store" approach, but it falls down on 2 points. (I'm using "naive" as a compliment here, meaning "do the simplest, most obvious, least clever thing".)

ghost commented 7 years ago

@isaacs Ok, I thought that's what you may have meant; just double checking.

But that does remind me of still another "meet the challenge" point. There also is the case of handling "bundled" node_modules as well. i.e. It should still be possible to have a module in a machine store that does have a '/node_modules' subfolder but all of it's content is 'locked' to that module (no symlinks to outside of module), most simply as copies. For example npm itself ships with all its dependencies.

isaacs commented 7 years ago

@phestermcs Please let's keep the scope of this problem and solution limited, or else we'll never be able to make progress in reasonable time. Saying that any solution is backwards compatible means that any existing solutions to those problems (for example, just copying files into folders) will continue to work just fine.

isaacs commented 7 years ago

Yes, backwards compatibility means that current apps work with the future node. Forwards compatibility, where future apps work with current node, is much harder to attain for any semantic change or bug fix. Presumably the entire point is that there's a problem we want to fix or something we want to do that is not currently possible.

isaacs commented 7 years ago

Here's another proposal possibility that might work.

  1. Track both the requested path (where the module was found) and the realpath (where the module actually is) on the module object.
  2. Caches remain as they currently are (using realpath as a key)
  3. node_module lookup path is the node_module directory set from the realpath and then the node_module directory set from the requested path.

I haven't tested it yet, but I believe that would solve the peer dependency issue addressed by --preserve-symlinks (similar to how adding the cwd-based node_module lookup paths would in the proposals in isaacs/node6-module-system-change), and also open the door for two module-store symlink based approaches that could satisfy the necessary criteria listed above.

The first module store symlink approach is to create actual nested directory hierarchies, but symlink the contents of the packages into their destinations. This results in a layout on disk that is more parallel to the naive "disk is cheap" approach, and that parallel is appealing.

However, it necessarily means that the package manager has to keep a record of files in each package, and it's WAY more actual symbolic links. If packages are linked from their development location into the module store, then the developer will have to refresh all of their module installations whenever a file is added to the package. (Maybe that happens rarely? Or can be somehow detected/automated?) So, it doesn't score very highly on the 5th heuristic (minimizing package manager complexity).

That would look something like this in practice (where index.js can be many files, of course)

app1
├── index.js
└── node_modules
    ├── baz
    │   ├── index.js -> ../../../.store/baz/1.2.5/index.js
    │   └── node_modules
    │       └── bar
    │           └── index.js -> ../../../../../.store/bar/2.3.4/index.js
    └── foo
        ├── index.js -> ../../../.store/foo/1.2.3/index.js
        └── node_modules
            └── bar
                └── index.js -> ../../../../../.store/bar/9.0.0/index.js

The second module store symlink approach is to lay files out on disk with a package.json file containing {"main":"content"} and a content folder that links into the store. This loses the parallelism with the naive copying approach, but it also means that package contents can be modified without a rebuild. It also adds an additional folder and some extra file system activity on startup, so the performance impact will have to be measured.

It looks like this:

app3
├── index.js
└── node_modules
    ├── baz
    │   ├── contents -> ../../../.store/baz/1.2.5
    │   ├── node_modules
    │   │   └── bar
    │   │       ├── contents -> ../../../../../.store/bar/2.3.4
    │   │       └── package.json
    │   └── package.json
    └── foo
        ├── contents -> ../../../.store/foo/1.2.3
        ├── node_modules
        │   └── bar
        │       ├── contents -> ../../../../../.store/bar/9.0.0
        │       └── package.json
        └── package.json

I'll try to find some time to make a patch for this minimal module system change so that we can investigate it further and try to shake out some other tradeoffs.

Once there are at least 2 or 3 options, we can figure out which one satisfies the criteria the best, and with a minimum of new footguns.

isaacs commented 7 years ago

@phestermcs Both module store approaches would depend on the same change to the module loader, described at the start of my post in 3 bullets. The existing module loader already knows what to do with the package.json file as described.

The only internal change to node would be that the module search paths include both the node_modules folders based on the realpath as well as the requested path.

zkochan commented 7 years ago

If the symlink-folder will be always named contents, why should there be a package.json?

This structure would be enough, wouldn't it? package.json is in contents anyway...

app3
├── index.js
└── node_modules
    ├── boo
    │   ├── contents -> ../../../.store/boo/1.0.0
    │   └── node_modules -> ../../../.store/boo/1.0.0/node_modules
    ├── baz
    │   ├── contents -> ../../../.store/baz/1.2.5
    │   └── node_modules
    │       └── bar
    │           └── contents -> ../../../../../.store/bar/2.3.4
    └── foo
        ├── contents -> ../../../.store/foo/1.2.3
        └── node_modules
            └── bar
                └── contents -> ../../../../../.store/bar/9.0.0
zkochan commented 7 years ago

This solution will work when someone requires the package via the entry point. However, frequently people are using a different file from the package, like require('foo/bar'). If bar.js is not in the root of the package anymore, it won't be accessible the old way...

isaacs commented 7 years ago

I wrote up a patch for what I'm suggesting: https://github.com/isaacs/node/commit/d0f9a99e9fd2d3caa4e3894cd6efe1ca79820128

It isn't yet ready for a PR (not least because it needs tests, docs, and the like), and I'd like to drive this discussion towards a shared understanding of the problem and requirements of a solution. I only wrote this patch because sometimes it's better to communicate thoughts in code rather than prose :)

This solution will work when someone requires the package via the entry point. However, frequently people are using a different file from the package, like require('foo/bar'). If bar.js is not in the root of the package anymore, it won't be accessible the old way...

Indeed! That's a shortcoming of the contents folder approach! Symlinking package contents would still work, though.

Another unaddressed concern: loading deps from modules loaded within the dependent module.

For example, if we slightly alter my foo package in the examples above to look like this:

// foo/index.js
console.error('in foo', module.paths)
require('./other-module.js')
// foo/other-module.js
console.error('in foo/other-module.js', module.paths)
var bar = require('bar')
console.log('foo using %s', bar)

Then we get this result:

in foo [ '/Users/isaacs/dev/js/node6-module-system-change/x/.store/foo/1.2.3/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/.store/foo/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/.store/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/node_modules',
  '/Users/isaacs/dev/js/node_modules',
  '/Users/isaacs/dev/node_modules',
  '/Users/isaacs/node_modules',
  '/Users/node_modules',
  '/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/app1/node_modules/foo/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/app1/node_modules' ]
in foo/other-module.js [ '/Users/isaacs/dev/js/node6-module-system-change/x/.store/foo/1.2.3/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/.store/foo/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/.store/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/node_modules',
  '/Users/isaacs/dev/js/node_modules',
  '/Users/isaacs/dev/node_modules',
  '/Users/isaacs/node_modules',
  '/Users/node_modules',
  '/node_modules' ]
module.js:498
    throw err;
    ^

Error: Cannot find module 'bar'
    at Function.Module._resolveFilename (module.js:496:15)
    at Function.Module._load (module.js:443:25)
    at Module.require (module.js:529:17)
    at require (internal/module.js:20:19)
    at Object.<anonymous> (/Users/isaacs/dev/js/node6-module-system-change/x/.store/foo/1.2.3/other-module.js:2:11)
    at Module._compile (module.js:602:32)
    at Object.Module._extensions..js (module.js:611:10)
    at Module.load (module.js:519:32)
    at tryModuleLoad (module.js:473:12)
    at Function.Module._load (module.js:465:3)

This can be fixed by also passing along the parent module paths to its children modules. I'll add that in soon.

@phestermcs I must admit I don't understand what you're talking about with respect to main.js in your most recent comment. Can you elaborate with a minimal example?

isaacs commented 7 years ago

Yeah, this fixes the issue: https://github.com/isaacs/node/commit/f727658c7e8a99e8c03120d7ab23761ec401166e

in foo [ '/Users/isaacs/dev/js/node6-module-system-change/x/.store/foo/1.2.3/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/.store/foo/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/.store/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/node_modules',
  '/Users/isaacs/dev/js/node_modules',
  '/Users/isaacs/dev/node_modules',
  '/Users/isaacs/node_modules',
  '/Users/node_modules',
  '/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/app1/node_modules/foo/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/app1/node_modules' ]
in foo/other-module.js [ '/Users/isaacs/dev/js/node6-module-system-change/x/.store/foo/1.2.3/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/.store/foo/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/.store/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/node_modules',
  '/Users/isaacs/dev/js/node_modules',
  '/Users/isaacs/dev/node_modules',
  '/Users/isaacs/node_modules',
  '/Users/node_modules',
  '/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/app1/node_modules/foo/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/app1/node_modules' ]
foo using bar 9.0.0
baz using bar 2.3.4

EDIT: updated commit link after rebasing to node master

ghost commented 7 years ago

@isaacs

node must "experience" the /node_modules rooted symbolic directory structure just like a human would when viewing through a file system explorer and expanding the directory hierarchy. Converting "main.js"'s path all the way down to its absolute realpath on program launch, and using that as it's __dirname prevents this, and breaks tooling.

Effectively, all modules derive there __dirname from the __dirname of the parent module that first required them, and all require resolutions are relative to __dirname. The main modules __dirname, a function of "main.js"'s path, is the root, or start, of this.

Example

In practice the structure and how accessed (tooling launching tooling, etc) is more complex and has different patterns, so this is a contrived simplification to highlight the fundamental problem inherent to all of them.

/store
  /peer
    /v1
      index.js
  /mod
    /v1
      index.js

/projectA
  /bin
    mod        -> ../node_modules/mod/index.js
  /node_modules
    /peer      -> /store/peer/v1
    /mod       -> /store/mod/v1
      index.js                        // var = require('peer')

$> node /projectA/bin/mod
Exception: cant find module 'peer'

With current behavior of converting "mod"'s path to its realpath, and using as the main modules __dirname, the require('peer') attempts to resolve relative to the "real path" /store/mod/v1, when it should resolve from the "symbolic path" /projectA/node_modules/mod.

In your proposed solution, using a realpath in the search list will also have the same effect, just lower down in the link tree.

isaacs commented 7 years ago

@phestermcs I'm trying to find solutions that do not add additional new API surface, including new folder paths that users might be confused by.

Adding module+node_modules as a folder to be searched at each level is an additional thing to document and explain. That is a cost. But some of these edge cases (not finding peer deps, resolving based on the real path only rather than linked path, etc.) do routinely come up as misunderstandings of how the current system works, so there's less cost in adding support for them.

And since it only adds support for something that didn't work at all before, and only at a lower priority than the current search paths, there's very little chance of someone's existing use case being disrupted.

mikeal commented 7 years ago

Adding lookup paths will have a performance impact as well. We've done a lot of work recently to make all this faster and if we go the .esm router we're already taking another perf hit from lookups. I'm a little worried about the performance regression this could cause.

isaacs commented 7 years ago

@phestermcs @mikeal Yes, I agree, that's a concern. Adding 2 or 3 additional folder lookup paths is probably fine. Adding 10 is less likely to be ok.

So, if we go from this:

[ '/Users/isaacs/dev/.store/foo/1.2.3/node_modules',
  '/Users/isaacs/dev/.store/foo/node_modules',
  '/Users/isaacs/dev/.store/node_modules',
  '/Users/isaacs/dev/node_modules',
  '/Users/isaacs/node_modules',
  '/Users/node_modules',
  '/node_modules' ]

to this:

[ '/Users/isaacs/dev/.store/foo/1.2.3/node_modules',
  '/Users/isaacs/dev/.store/foo/node_modules',
  '/Users/isaacs/dev/.store/node_modules',
  '/Users/isaacs/dev/node_modules',
  '/Users/isaacs/node_modules',
  '/Users/node_modules',
  '/node_modules',
  '/Users/isaacs/dev/app4/node_modules/foo/node_modules',
  '/Users/isaacs/dev/app4/node_modules' ]

then that feels a lot less risky to me than this:

[ '/Users/isaacs/dev/.store/foo/1.2.3/node_modules',
  '/Users/isaacs/dev/.store/foo/1.2.3+node_modules',
  '/Users/isaacs/dev/.store/foo/node_modules',
  '/Users/isaacs/dev/.store/foo+node_modules',
  '/Users/isaacs/dev/.store/node_modules',
  '/Users/isaacs/dev/.store+node_modules',
  '/Users/isaacs/dev/node_modules',
  '/Users/isaacs/dev+node_modules',
  '/Users/isaacs/node_modules',
  '/Users/isaacs+node_modules',
  '/Users/node_modules',
  '/Users+node_modules',
  '/node_modules',
  '/Users/isaacs/dev/app4/node_modules/foo/node_modules',
  '/Users/isaacs/dev/app4/node_modules/foo+node_modules',
  '/Users/isaacs/dev/app4/node_modules',
  '/Users/isaacs/dev/app4+node_modules' ]
ghost commented 7 years ago

@isaacs @mikeal I had this very same concern, but in normal use, most of the time only the first couple of paths actually get searched before somethings found, not all of them, and with the caching enhancements, the actual stat() cost only happens once

ghost commented 7 years ago

For now, don't think of my solution as 'the' solution, but as a fully working one.. a baseline.. a bar to meet. a way to actually experience any solution.

zkochan commented 7 years ago

I think it makes sense to lookup for the adjacent node_modules only one directory up. In that case it won't be +10 additional directory lookups but only +1.

[ '/Users/isaacs/dev/.store/foo/1.2.3/node_modules',
  '/Users/isaacs/dev/.store/foo/1.2.3+node_modules',
  '/Users/isaacs/dev/.store/foo/node_modules',
  '/Users/isaacs/dev/.store/node_modules',
  '/Users/isaacs/dev/node_modules',
  '/Users/isaacs/node_modules',
  '/Users/node_modules',
  '/node_modules',
  '/Users/isaacs/dev/app4/node_modules/foo/node_modules',
  '/Users/isaacs/dev/app4/node_modules']
zkochan commented 7 years ago

although my example is not correct, because I looked up in the store

ghost commented 7 years ago

@zkochan when a root tree would be deployed using anm there are no more /node_modules other than the root, they're now all +node_modules. and mechanically speaking resolution works just like /node_modules

zkochan commented 7 years ago

I see, but they will always be one level up, right? There is no need to check in each parent directory till the root of the disk. Hence, it won't add more lookups. I think the lookup can be even stopped if there's an adjacent node_modules dir.

ghost commented 7 years ago

@zkochan whoever implemented the stat() caching did a great job, because stat()ing all the time is what would kill performance. But once a path has been stat()'d, it never is again, no matter how many more times requires happen against it and its parents.

isaacs commented 7 years ago

@phestermcs It is very difficult to keep up with this conversation and address everything you bring up when you post updates every 10 minutes. That is why I've been trying to block out a bit of time to identify some issue, thoroughly examine it, sleep on it for a few days, write an example implementation, and post a fully digested update.

When you say things like "Thats means, the anm solution is ONE LINE change" or "Does it even matter to anyone here that I ran citgm and results were identical? and that citgm itself was running with the altered version of node?", I interpret this to be somewhat demanding and frustrated. It's possible, since text is what it is, that I'm misinterpreting your intent, and so I am attempting to give you this feedback only as an expression of how your message is being received, in the hopes that you might benefit from this knowledge.

If I'm not misinterpreting your urgency or agitation, then I must again insist that you please show restraint and respect for the process. This will take a long time. We will examine every angle very thoroughly. Demanding that we skip this process this will not make it go faster. Broadcasting attachment to a specific outcome will not guarantee that outcome. The process is here for a reason.

Node is no one's pet project. There are millions of people depending on its stability. Please be at peace in this space.

isaacs commented 7 years ago

@phestermcs Just to be clear, you didn't piss me off, and I don't see anyone else suggesting that they're pissed off at you. I am merely pointing out that the urgency and frustration you are bringing to this issue is not helping drive us towards a resolution. Additionally, using superlatives about a specific solution also makes it harder to do this work. The task before us is to very clearly identify the problem to be solved, and explore the solution space.

I hate to keep repeating it, so this is the last time I'll say this: it is important to be excessively careful about any changes to node's module system. Frustration and urgency are the opposite of the ideal approach.

I'm not that smart, and I only have a limited slice of my attention to spend on this, but what I do offer is a wealth of experience and institutional memory about how the module system is designed, and why. I ask that you be patient with me. It's only been a few days since I started looking at this. We will all benefit if we can do this very carefully and explicitly, so that there are multiple options on the table, and the problem space is mapped out in excessive detail.

bmeck commented 7 years ago

@phestermcs Core contributors are very careful with things that affect the Module system. Increased support for symlinks would be great! However, part of the problem is the difficulty in knowing how the packages in the wild on the public registry rely upon certain features and/or constraints of the module system. I can speak from brutal experience; understanding the full surface of what would break by using ESM took me over 1 year. There will be some breakage I did not plan for even after over a year of investigation.

As such, I would not rush to give a set of tests that need to pass. Going out on the public registry, github, etc. and searching for breakage takes time. Understanding path expectations is even more confusing than how module.exports/require is used since path resolution is mostly transparent due to it being on the fs and not visible in source code.

This enhancement will most likely take a long time. People will incrementally rebuild your solution (or different ones) to understand the edge cases of the problem space themselves and present minor variations from their explorations. Watching this will take a lot of patience. They will get tripped up on things that may already be known to need workarounds/solutions, but they may discover new things.

I cannot give you a list of expectations on speed or clarity in specifying the requirements to land this. Large or ecosystem affecting PRs take long times; for example, the Workers PR is over a year old. We will get through this, slow and steady. I agree that we should setup a call in the near future, and I can try an email as a means to organize a hangout.

bmeck commented 7 years ago

CC: @groundwater , is there an Electron person who can read through this and see if symlink paths affect things at all (it may not)?

jasnell commented 7 years ago

@phestermcs ... I understand that this issue is important to you. I also understand that you have a strong desire to contribute meaningfully to both to this issue and to Node.js in general. Those contributions and your efforts are definitely appreciated. However, please do keep in mind that how thoughts and ideas are communicated can be disruptive to achieving your goal. Frequent long messages, calling out individual contributors with big bold text (that comes across as shouting on here), and just the overall general tone of your posts are contributing to what many participants and observers in this conversation are interpreting as counter-productive (at best) and harassing (at worst) behavior. I do want this issue to be resolved, but if I may suggest, please take a bit more care in how you are engaging here and please pause before posting to consider how your posts may be coming across to others. It will likely be far easier to make progress if you take a significantly less confrontational and urgent approach in your messages.

ghost commented 7 years ago

My apologies @isaacs. In humility ask you and other for fogiveness.

isaacs commented 7 years ago

@phestermcs Forgiven and forgotten. I am eager to move on.

So, I'm confused by something.

Let's say you have a module at /store/foo/index.js.

Then, that module does require('./lib/thing.js'), loading from /store/foo/lib/thing.js.

Then, in /store/foo/lib/thing.js, it does require('bar') to pull in foo's dependency, "bar".

I then symlink the foo module in to /app/node_modules/foo.

When lib/thing.js does require('bar') what paths get searched?

Today, it would be these:

/store/foo/lib/node_modules
/store/foo/node_modules
/store/node_modules
/node_modules

In the "add node_module folders for request path, as well as parent module" proposal I have on my ep-46 branch, it'd search in these:

/store/foo/lib/node_modules
/store/foo/node_modules
/store/node_modules
/node_modules
/app/node_modules

I thought I had wrapped my head around the search paths that would be used with anm, but your reaction to my comment above about adding 10 new search paths left me confused. What exactly are you suggesting we add to the search path set in this case?

Because it can't just be __dirname + '+node_modules' or else the thing.js will look in /store/foo/lib+node_modules first, which doesn't seem very worthwhile. And if it's only based on the main module, then that means that it's going to search in /app+node_modules, which also doesn't seem right.

What are "adjacent node_modules" adjacent to, exactly?

ghost commented 7 years ago

Thanks @isaacs for your graciousness; it's warmly received.

First, allow me to just clarify the basic idea behind adjacent+node_modules. It's fundamentally identical to /node_modules which I call subordinate/node_modules as in that case the node_modules directory is "underneath" it's module. "adjacent" just means the node_modules directory for the module is "next to" its module's directory rather than underneath it. Its really just the exact some thing, but using the + instead of the /, which then decouples it from the module directory itself. Once it 'clicks' for you I think you'll appreciate its simplicity, and how its just as elegant as, and in the spirit of /node_modules with respect to handling version resolution, because it's pretty much the same thing

/app
  /node_modules
    /depA/node_modules
    /depA+node_modules
    /depB/node_modules
    /depB+node_modules

Now, for how things should ideally resolve in your example, the first thing is that, and this is related to the "main.js" fix, /store/foo would not be the __dirname used to resolve anything, assuming it had been symlinked. From your example, I'm guessing this is how the layout is:

/store
  /foo
    index.js
    /lib
      thing.js

/app
  index.js
  /node_modules
    /foo          -> /store/foo

launching with: node /app

the main module's __dirname is set to /app

then:

app/index.js
    require('foo')

the search paths would be:

/app/node_modules
/app+node_modules
/node_modules

when foo is loaded, its __dirname is set to /app/node_modules/foo, so for:

foo/index.js
  require('./lib/thing.js')

the search path would be relative to foo's __dirname (because of the leading ./):

/app/node_modules/foo

so then thing's __dirname is set to /app/node_modules/foo/lib

so

thing.js
  require('bar')

search paths:

/app/node_modules/foo/lib/node_modules
/app/node_modules/foo/lib+node_modules
/app/node_modules/foo/node_modules
/app/node_modules/foo+node_modules
/app/node_modules
/app+node_modules
/node_modules

Does that make sense?

ghost commented 7 years ago

I should mention, I used your example, which symlinked under /node_modules, so the detail I showed you is pretty much describing backwards compatibility; i.e. +node_modules are in the search list, but they don't exist, so nothing changes

isaacs commented 7 years ago

Ok. So what happens if we symlink /app/bin.js to /usr/local/bin/app? The main purpose of resolving symlinks to their realpath for module resolution is that linked files typically need to find their modules based on their actual location on disk.

ghost commented 7 years ago

/use/local/bin/app is a file symlink and /app/bin.js is its target. And that is exactly why when "main.js" is a file-symlink, we simply use it's target path to set the main modules __dirname (that's very, very, very different than converting the file symlink all the way down to it's realpath), but in all other cases we just preserve the entire path.

Keep in mind the target of the link can also contain symdirs in its path, and we want to preserve those.

If the target is a relative path, than we want to resolve it relative to the file symlinks path.dirname(), which may also have symdirs in it's path, which will also want to preserve.

This sounds a little more involved than it actually is, but it always keeps everything correctly rooted in a symbolic path space, so all descendant requires resolve intuitively.

ghost commented 7 years ago

I believe because "main.js" is currently always converted to its realpath, it's the fundamental reason so much tooling broke. I would consider this a bug.

isaacs commented 7 years ago

So, if I'm understanding your proposal correctly, then:

  1. require.main is not resolved to a full realpath, but only the first level if it is a symbolic link, and only once. (For example, if /usr/bin/app -> /app/bin.js and /app -> /home/dev/app, or some other chain of symbolic links, it won't get fully resolved.)
  2. Modules are not realpath'ed (or symlink-resolved at all) from there on.
  3. When a module is found, its node_modules hierarchy includes <path>/node_modules as well as <path>+node_modules for all the elements in its path, starting with the path.dirname() of where it is found.

It seems to me then that this breaks the current behavior in at least the following cases (which may or may not be resolvable):

  1. A file symlink that is not the main module. For example, if /store/foo/lib/blerg.js is a symlink to /store/bar/blerg.js, then it won't find modules in /store/bar/node_modules. (This happens more often in the wild than you might expect. For example, scripts that do require(which.sync('some-module-exe')) to load the module's executable.)
  2. A file symlink that is a link to a symlink, even as the main module. For example, /usr/local/bin/app -> /store/app/bin and /store/app/bin -> /store/app/command-line/index.js. In this case, /store/app/command-line/index.js will not find modules in /store/app/command-line/node_modules.
  3. Since anm only looks one symlink deep for the main module, and doesn't realpath the whole way, it doesn't include any node_modules folders that are found relative to the realpath but not relative to the first-level link path. For example, /projects/js links to /home/isaacs/projects/js. /bin/app links to /projects/js/app/bin.js. /home/isaacs/node_modules won't be in the module search path.

Please let me know if these deviations from the current contract represent a misunderstanding on my part. Thank you.

ghost commented 7 years ago
  1. require.main is not resolved to a full realpath, but only the first level if it is a symbolic link, and only once. (For example, if /usr/bin/app -> /app/bin.js and /app -> /home/dev/app, or some other chain of symbolic links, it won't get fully resolved.)

..and only if /usr/bin/app were a file-symlink. /symdir/modir/ and the like are left alone.

  1. Modules are not realpath'ed (or symlink-resolved at all) from there on.

..which is exactly how --preserve-symlinks works today.

  1. When a module is found, its node_modules hierarchy includes <path>/node_modules as well as <path>+node_modules for all the elements in its path, starting with the path.dirname() of where it is found.

Correct. However, I consider this the anm enhancement, and an entirely separate thing from 1 & 2, which are just about working with symlinks generally. anm has no idea if symlinks are involved or not, it just makes using them to machine stores possible. They really should be seen is two discreet things.

Re breaking current behavior

  1. A file symlink that is not the main module. For example, if /store/foo/lib/blerg.js is a symlink to /store/bar/blerg.js, then it won't find modules in /store/bar/node_modules. (This happens more often in the wild than you might expect. For example, scripts that do require(which.sync('some-module-exe')) to load the module's executable.)

How require(which.sync('some-module-exe')) would behave is exactly as it does today with --preserve-symlinks. The handling of a file-symlink is only for "main.js", not for other require() calls. As an aside, in your example the path passed to require would be an absolute path I believe.

  1. A file symlink that is a link to a symlink, even as the main module. For example, /usr/local/bin/app -> /store/app/bin and /store/app/bin -> /store/app/command-line/index.js. In this case, /store/app/command-line/index.js will not find modules in /store/app/command-line/node_modules.

For handling "main.js", that is correct. I have not encountered that pattern, but I believe file-symlinks to file-symlinks, however many levels, could be simply handled by just looping the "main.js" logic until a non file-symlink is hit, which would still be preserving any symdirs along the way.

  1. Since anm only looks one symlink deep for the main module, and doesn't realpath the whole way, it doesn't include any node_modules folders that are found relative to the realpath but not relative to the first-level link path. For example, /projects/js links to /home/isaacs/projects/js. /bin/app links to /projects/js/app/bin.js. /home/isaacs/node_modules won't be in the module search path.

anm has nothing directly to do with symlinks, it just makes using them to machine stores possible.

Your example with additions to highlight use case as to why that might not be preferred.

/home
  /isaacs
    /node_modules
      /bar               // @ 3.0.0
    /project
      /js
        /app
          bin.js         // @ 1.0.0

/projectA
  /node_modules
    /bar                 // @ 1.0.0
  /js                    -> /home/isaacs/projects/js
  yarn.lock
    "app" @ 1.0.0
    "bar" @ 1.0.0

/projectB
  /node_modules
    /bar                 // @ 2.0.0
  /js                    -> /home/isaacs/projects/js
  yarn.lock
    "app" @ 1.0.0
    "bar" @ 2.0.0

/bin/appA                -> /projectA/js/app/bin.js
/bin/appB                -> /projectB/js/app/bin.js

You are correct. However, this "feels" similar to the scenario of concurrent development of interdependent modules on the same machine, and in that case it might be preferred ancestor dependency version resolutions stay with the project. But yes, you are correct.

ghost commented 7 years ago

Fwiw, Just on principle, always preserving the symbolic space will allow the user (even if via tooling) to fundamentally always be in full control of exactly how they want to organize their directory structure. When converting to realpaths, that control is taken away.

anaisbetts commented 7 years ago

No known problems with ESM on this front. If we were talking archive formats that serialize symlinks like ASAR / NODA / WebPackage we might have things to think about, but none are currently supported by Node.

:wave:, Electron person here, trying to understand the book worth of text that's written about this - fwiw, ASAR doesn't support symlinks at all, so we're good on that front. From our perspective, this probably won't affect anything (i.e. we're pretty much 100% compatible with node in this regard).

Thanks for CC'ing someone from Electron into the thread!

ghost commented 7 years ago

@isaacs You described certain symlinking techniques which I'm sure might exist. I'm just curious as to the use cases they are supporting, and just generally the frequency of them, and how the disparity between nix and win regarding use of file-symlinking impacts them? With respect to preserving symlinks allowing the user to remain in control of their directory structure (and by consequence how dependencies resolve), I reason those use cases could still be supported by some minor organizational changes, should the change in behavior break them.

It seemed some of your concerns revolved around those situations leveraging the fact node searches for /node_modules directories all the way to the root, as an approach to sharing common modules between applications, or "main" entry points. It appears the trend, as evidenced by npm's shrink-wrapping, yarn's lock files, and ied's CAS model, is to fully isolate version resolution to "main" entry points so behavior, with respect to the precise versions of modules being used, can always be deterministic (i.e. all resolutions resolve from "mains" companion /node_modules directory). Relying on a common ancestor /node_modules to share dependencies between "main" entry points opens a hole in the determinism when applications run on different machines, because control is lost in guaranteeing version resolution from machine to machine. Being able to symlink to machine stores is another way modules can be shared between applications, but still allow them to fully dictate the precise versions they require. Just curious what your thinking is in that regard?

One area we've yet to cover, is how the memory-bloat and add-on crashing problems created when using symlinks get fixed. Interestingly the solution to both is the same, and also enables using symlinks to easily address the issue of peer dependencies, without having to be concerned with the physical directory structure and how many places the peer dependency is depended upon, because all of them come down to only having a single instance of a module@version in memory. This is fundamentally solved by always caching module instances by their realpath (and this is a case where we do want to go all the way down to the real path), yet setting the __dirname to the requestPath of the first require() that originally loaded the module into memory. It does rely on package managers laying down physical dependency trees that honor the logical tree; i.e. within an execution context, the versions of a module's dependencies never change just because the module itself may be multiply depended upon. Also curious as to your thinking on this?

(I'm really trying to use less words and comment less often... please bear with me as I try to get better at that in this forum)