Closed VanCoding closed 8 years ago
This would break npm link
, wouldn't it?
if you are requiring dep1, and it is a symlink, and it doesn't have dep2 in its path then something is off is it not? Should the module not include its own node_module folder?
That being said flat dependencies in npm@3 do offer a weird edge case to this. But it is worth bringing up that I am pretty sure they are deprecating peer dependencies.
@Trott why would it? In theory, everything that works with the real path should also work fine with the virtual path. The only real problem that i see is when a module gets required multiple times through different symlinks it would not be cached by npm and get loaded multiple times. But if that really happens, something is badly designed i think.
@TheAlphaNerd No, because its a peer dependency. I don't think NPM is depreciating peer dependencies in gerenal, but changing its behavior to just warn when a peer dependency is not fullfilled. Maybe they're even gonna drop the peerDependencies property, but that does not prevent people to peer depend on things. Devs will then just need to manage them on their own.
I don't think the need for peer dependencies will go away in the future.
I've also asked for a new npm command that would only work if we fix the behavior of require like i suggested here: https://github.com/npm/npm/issues/10000#issuecomment-148737934
@VanCoding I was thinking of certain edge cases but I wasn't thinking very deeply about it, so yeah, that particular comment may be a non-issue.
The real point I was sorta kinda trying to gently make was better put by @othiym23 (emphasis added):
I do think you're right that require()'s behavior is unhelpful in this case, but I also think that given how important having that behavior locked down is to the stability of the whole Node ecosystem, it's going to be very tough to change now.
Any semver-major change to require()
could result in all sorts of (potentially hard-to-predict) ecosystem breakage. So the bar is very, very high.
I know, but I think the current behavior maybe even counts as a bug because there seems to be no reason to behave like this.. Am 17.10.2015 7:06 nachm. schrieb "Rich Trott" notifications@github.com:
Moreover, the Modules API (which require is a part of) is Locked which means:
Only fixes related to security, performance, or bug fixes will be accepted. Please do not suggest API changes in this area; they will be refused.
— Reply to this email directly or view it on GitHub https://github.com/nodejs/node/issues/3402#issuecomment-148934364.
The litmus test is this: is there a non-zero chance the proposed change is going to break someone's existing application? If the answer is 'yes' (and I think it is), then it should be rejected.
Well, how are we going to calculate that chance? And what does a zero chance mean? Zero like not even module is going to break or not 1 % or more of the pakages? If it's the latter, then i really doubt that 1% of the modules would be affected by that change. If there really are modules that break with this change then theyre making use of undocumented (in my opinion buggy) behavior and therefore it's okay when it breaks.
Let's look at it like this: one can always get the real path from the virtual path, but if one gets the real path, the virtual path is lost.
So even if we break some modules, isn't the right decision to fix this? I really think it would help fix issues for npm as well.
But it of course is your decision. Am 17.10.2015 7:34 nachm. schrieb "Ben Noordhuis" <notifications@github.com
:
The litmus test is this: is there a non-zero chance the proposed change is going to break someone's existing application? If the answer is 'yes' (and I think it is), then it should be rejected.
— Reply to this email directly or view it on GitHub https://github.com/nodejs/node/issues/3402#issuecomment-148936831.
@VanCoding, maybe hard links would help. I haven't tried it.
@dlongley You can't hardlink a directory I think?
I am having the exact same issue as @VanCoding and I'm also a bit confused by the current behaviour...?
I would expect a linked package to resolve dependencies based on the location it was linked to. It should of course check it's own node_modules first, but when going back "up the tree" I would expect it to check the folder structure it was linked to, not where it was linked from.
@asbjornenge,
You can't hardlink a directory I think?
No, you can't, but maybe you could write a script to create a mock directory and hard link any js
files. I think that would be all it would take for simple modules. It's not a perfect solution but maybe it would help in some cases. Clearly there needs to be a better way to link peer dependencies together during development.
I would expect a linked package to resolve dependencies based on the location it was linked to.
Me too, but I believe this has been discussed before -- and there may be projects out there that are depending on the current behavior. We'll just need to come up with a way to specify that the other behavior is desired.
and there may be projects out there that are depending on the current behavior
As I said before:
And as it seems...
We'll just need to come up with a way to specify that the other behavior is desired.
@dlongley, I don't like it. I'd rather 'keep it simple ..' in the long-term.
@VanCoding makes a compelling argument.
Perhaps we should make this minor 'breaking change', and just add an opt-out capability. The opt-out should be removed in a future version..
I agree with @VanCoding's argument.
Perhaps we should make this minor 'breaking change', and just add an opt-out capability. The opt-out should be removed in a future version.
:+1: I'm in favor of this if it can get consensus.
Perhaps we should make this minor 'breaking change', and just add an opt-out capability. The opt-out should be removed in a future version.
That is an addition, so semver-minor
. I will warn you that we are very conservative about doing anything with the module system.
If someone wants to put together a PR for this (with the understanding that, again, the bar for this is very, very high and so even if it's a Great Idea and an Excellent Implementation, it's still very likely that this Will Not Happen), it might be interesting to use https://github.com/nodejs/citgm (if that tool is ready for general use?!) to see if anything Too Big To Fail in the ecosystem chokes on the change.
(To head off the obvious and explain the bold comment above: Yes, it's easy to fix any modules that break. That's not good enough though. Here's the deal: Let's say this breaks the async module. Fixing it in a new version of async is theoretically no problem. But then getting the thousands and thousands of modules that depend on async to update to the fixed version is a huge problem. And getting everyone using those thousands and thousands of modules to update those modules to fixed versions... This is why a stable and imperfect module API is vastly preferable to a dynamic module API.)
Of course, if there was a way to confirm that no modules on npm depend on the current behavior, that would go a long way (in my opinion, anyway) towards making the change palatable. I have no idea how one would go about demonstrating that, though. But you can use citgm
to at least confirm that it won't break any of a handful of the most important ones.
@Fishrock123, I tend to think my suggestion should wait until the next semver-major
since I suggested a breaking change with 'opt-out'?
@Trott, These are good points. The last thing sysadmins want is trouble when upgrading a datacenter to a new version of Node.js.
We should also consider any arguments for current behavior besides compatibility. Are there cases where the current behavior is more intuitive than the proposed change? If not then I think the proposed change should be implemented sooner or later.
I know that Modules are at Stability 3 - Locked.
Only fixes related to security, performance, or bug fixes will be accepted. Please do not suggest API changes in this area; they will be refused.
There are good ways to introduce a breaking change over a period of time. Statistics would indicate the best course of action - even if it means extra work on the front end before this change can be implemented.
This is causing me pain right now. I'm working on an enhancement to a library that automagically loads plugins by doing require(pluginName)
. But since I've used npm link
to install the library, the require
fails because pluginName
can't be found in the tree.
I think this behavior is unexpected, especially given the interaction with npm link
. I think a symlinked package should behave identically to an installed package in the same location.
From the modules docs:
If it is not found there, then it moves to the parent directory, and so on, until the root of the file system is reached.
I suppose this is ambiguous, but I would argue that the "parent directory" should be relative to the path used to load the module. If I do require('/home/me/project/node_modules/foo/index.js')
, and index.js does require('something-else')
, I think it is incorrect to look up something-else
relative to anything other than the full path in the first require.
@Trott it might be worth taking a look at https://github.com/brson/taskcluster-crater to see how they test the rust ecosystem against potentially dangerous changes.
@fenduru :+1:
Looking at the examples from the documentation, it is, IMO, unexpected behavior for a file that happens to be a symlink to behave differently. I would expect it to behave the same as any other file, which is as @fenduru described.
Furthermore, the documentation gives one good reason for the way module searching is performed:
This allows programs to localize their dependencies, so that they do not clash.
The fact that searching does not work this way with symlink'd modules makes it very difficult to make use of localized dependencies (this is the chief complaint). As pointed out above, it also causes different (and unexpected) behavior with npm link
. Perhaps this issue is better classified as an "unexpected behavior" bug rather than a request for change.
Just a note here that a fix to the unexpected behavior shouldn't change the existing behavior for module identifiers that are native or that begin with '/', '../', or './'. In those latter three cases it does make sense to use realpath
; changing that would be incorrect and I imagine would wreak havoc.
I also found this require/symlink behavior to be unexpected and undesirable. It goes against common UNIX practice.
Out of curiosity - does node on Windows also have this require/symlink behavior? (At one time Windows did not support symlinks). If node behavior on UNIX and Windows differ in this regard that would strengthen to case against symlink resolving in require() on UNIX to be consistent.
At the very least an optional node command line flag could be introduced to disable resolving symlinks when require()ing.
@kzc,
Out of curiosity - does node on Windows also have this require/symlink behavior?
Assuming that fs.realPathSync
dereferences symlinks on windows then it seems so (but I only checked quickly): https://github.com/nodejs/node/blob/master/lib/module.js#L150
It looks like the problem is that no distinction is made between loading a module with an identifier that starts with '/', '../', or './' and one that doesn't -- all are given the realpath
treatment (and it doesn't matter which OS you're using). If, instead, realpath
were only used in the former cases, then symlinks would be treated like other files (I think) and you'd get the expected behavior and the ability to localize dependencies for symlink'd modules.
Regardless of whether require()d modules have relative or absolute paths they still could be converted to canonicalized absolute paths without resolving symlinks. I see the need for the module cache to use absolute paths so that it knows when it's dealing with the same module referenced from various different relative paths. But if someone explicitly introduces a symlink in the path of a module then I'd argue that most would expect it to be treated differently than the symlink resolved version according to its non-resolved directory position.
@kzc,
Regardless of whether require()d modules have relative or absolute paths they still could be converted to canonicalized absolute paths without resolving symlinks.
This is problematic for the case shown here: http://grokbase.com/t/gg/nodejs/12cbaygh7a/require-resolving-symlink
This is required because of the way that executables are typically
linked. On Unix, you have something like:
/usr/local/node_modules/foo/bin/foo.js
which does `require("../lib/foo-internals.js")`.
That script is linked to
/usr/local/bin/foo
If we didn't realpath, then it'd be looking for
/usr/local/lib/foo-internals.js instead of
/usr/local/node_modules/foo/lib/foo-internals.js.
The node start script is a special case that could be symlink resolved if the file has a shebang or a non .js suffix.
I'm having the same problem as @fenduru.
npm install
, they end up in the correct directory (user project directory).require.main.paths[0]
, it's 100% correct and contains the user project directory.require
and require.resolve
both throw.This is unexpected.
It only happens when symlinked / npm link'd. In a normal installation everything works as expected.
I don't want to npm publish
every single change - how am I gonna continue developing the lib?
Since the directory is actually included in require.main.paths
this issue is undoubtedly a bug if you ask me. At the very least it's worth fixing.
:+1:
+1 this is a pain and makes npm link almost useless to me (see https://github.com/npm/npm/issues/5080).
Resolving symlinks this way is just plain wrong and node should not be doing it.
In order to fix this issue, it seems node would need to be able to load multiple instances of the same module file (same realpath location).
This means storing symlinked module files in the require.cache
using their symlink names instead of their realpath names. There would need to be a new instance in the cache for each module that is loaded via a different path (different symlink) in order to ensure that the proper search paths are used for any require()
calls made from within that module. Once this is done, the search path for finding another module could first walk up the symlink path and then, if no match is found, it could walk up the realpath. This might help with backwards compatibility.
However, this seems to be a significant departure from how the require cache currently works. How existing projects would be affected by this is unclear. Though, it may really only affect projects that use peer dependencies -- which are the projects that need this fix. Is a change to fix this unexpected behavior palatable? What do you think, @jasnell?
Is there anyone interested in this issue that feels comfortable enough to create a PR to try and resolve it in a sensible way?
To summarize the core problem here:
If you want to work on a dependency in the context of a dependent package, you add a node_modules
sub-directory to the dependent package and link to its dependency there. Whatever changes you make to the dependency will be reflected in the dependent package.
If you want to work on one of N peer dependencies in the context of a dependent package, you have no such remedy. There is no simple mechanism by which to link to a single peer dependency in the context of the dependent package. This makes developing systems that use peer dependencies unnecessarily difficult.
If we change the module search path behavior to walk up looking for <parent>/node_modules
using the symlink path first and then the real path, this problem would be eliminated. Developers of projects that do not use peer dependencies but want to ensure the proper dependencies are loaded, may continue to use the current behavior of adding a node_modules
sub-directory. This is how dependency conflicts were already resolved in existing projects.
The non-peer dependency use case that would change would be the following:
A dependent package required a non-peer dependency that was symlinked. That dependency did not have all of its sub-dependencies in a local node_modules
sub-directory. Therefore, its realpath
parent directories were walked looking for the sub-dependency in <parent>/node_modules
until it was found. With this new behavior, its symlink path parent directories would be walked first, prior to walking its realpath
parent directories. This could potentially find a different sub-dependency. The original behavior can be easily restored by adding a local node_modules
sub-directory.
The only question is, were packages previously depending on this behavior?
This behavior should not affect any npm installed packages unless they have custom install scripts that create symlinks in this manner (and it seems questionable why this would be done at all). There is a potential for this to affect globally installed and npm link
ed packages, but only if npm install -g
does not ensure all required sub-non-peer-dependencies are installed in a node_modules
sub-directory. If it doesn't have this assurance, it could be added in future versions to ensure correct behavior with new versions of node.
Here's an alternative proposal to resolve this issue that should not impact any packages unless they are already using peer dependencies, and therefore, may not need to be considered a potentially breaking change.
Node could be made aware of "peerDependencies" by checking for this value in package.json
:
Suppose we have a package top
which depends on two packages framework
and plugin
, where plugin
is symlinked to another directory containing a development version of plugin
:
|--top
|--node_modules
|--framework
|--plugin (symlink to /dev/plugin)
|--dev
|--plugin
|--package.json ({"peerDependencies": {"framework": "^1.0.0"}})
|--framework
In this scenario, we want top
to load plugin
from the dev
directory and we want plugin
to load top/framework
not dev/framework
.
The plugin
package includes a peerDependencies
entry in its package.json
that has framework
as a key. When the top
script calls require('plugin')
, node finds plugin
via its normal mechanism. However, when loading plugin
, node sees its package.json
file and that it has a peerDependencies
entry. This causes plugin
to be identified via its absolute path (which may be a symlink) instead of its realpath
. From an implementation perspective, an entry in require.cache
uses the absolute path as the key and a new module instance as the value (unless previously cached from the absolute path).
Now, when the plugin
script calls require('framework')
, since framework
is in the peerDependencies
entry in package.json
, it will be be searched for by first walking plugin
's module.parent
(which will be top
's path) before walking its realpath
. Any other require target (not present in peerDependencies
would be searched for using the module's realpath
as usual.
I believe this approach would handle the use case at hand (developing packages that use peer dependencies and using npm link
with peer dependencies) without affecting any other packages.
Does anyone here have feedback on this proposal? Would people prefer this one over the previous one -- or neither? General thoughts?
I really don't think node should be looking in package.json
and doing special things with peerDependencies. It should work as expected for any symlinked package. There is no need to special case peerDependencies if node ceases to mess around with resolving symlinks.
I don't see the need to walk up the realpath at all and I would suggest that (excepting any case for backwards compatibility, possibly temporarily) this should be avoided if possible.
Consider:
A
A/node_modules/B (symlink to B)
A/node_modules/C
B
require('C')
from A should use A/node_modules/C
require('C')
from B should use B/node_modules/C
if available, and fallback to A/node_modules/C
(this does not happen at the moment)Why would you _ever_ want to start searching in A/B's parent?
Even if you think about linking sub-dependencies:
A
A/node_modules/B/node_modules/C (symlink to C)
C
Searching the realpath still makes absolutely no sense - that would mean searching A/C's parent.
The only possible scenario I can think of where searching the realpath would be necessary would be to link to a dependency which is halfway down someone else's tree:
A
A/node_modules/D (symlink to B/node_modules/D???)
B/node_modules/C
B/node_modules/D
...which makes no sense whatsoever. Why would you do this!?!?
So, never resolve symlinks ever, always use the virtual path and don't do anything special.
I'd be interested if someone could make a patch for this (even if imperfect) so we could go about testing it. I also think we could get someone to help whoever is interested, if necessary. :)
Related issue:
[rfc] module: change load order #4595
I agree 100% with @SystemParadox, I see no reason what so ever to resolve the symlink.
Links are supposed to help development by simulating a classic
node environment and still be able to make changes easily.
If changing this affects anyone it most likely is only in local environment using crazy hacks because of this behavior.
You can't write a module assuming your parent or dependencies are at a different "real" location since you don't have any control on the file structure when your module is being installed on another machine. Plus you most likely can't write any code that will create symlinks since you can't assume you are even allowed to (Creating symlinks are disabled for users on Windows by default).
Plus, I still fail to see a use case where the current behavior is desirable or even used anywhere at the moment. The only thing I see are people trying very hard to circumvention this bad behavior in node to be able to work with symlink. Sometimes, I had to resolve to copy my module to the final destination and develop directly there so I could test it.
I created an issue a long time ago at nodejs/node-v0.x-archive/issues/18203 and I think I will repost the issue here since this is where the discussion is.
I have been working for a while with link to develop multiple module or simply reuse my local version. However, every time I link a module using peerDependencies the link just becomes very painful.
For instance, if I have a structure like
pkg1@1.0.0 D:\temp\pkg1
├── lodash@3.7.0
└── pkg2@1.0.0 -> D:\temp\pkg2
And for whatever reason pkg2
tries to require lodash
then I get the error Error: Cannot find module 'lodash'
.
However, without the link, it works perfectly.
An even greater problem is if I happened to have lodash installed globally it works with the link, but not using the version intended because I could have version 2.8.0 installed globally.
This is worst, because you don't get an error, everything runs and at one point it might break and you will have to search everywhere to finally realise you were not using the right version because of your link.
Now I wanted to try to find the source of the problem and maybe fix it.
I realized that when you require pkg2
in my example, its path will be D:\temp\pkg2\index.js
instead of D:\temp\pkg1\node_modules\pkg2\index.js
which result in the resolve paths to be
paths: Array[3]
0: "D:\temp\pkg2\node_modules"
1: "D:\temp\node_modules"
2: "D:\node_modules"
Instead of
paths: Array[4]
0: "D:\temp\pkg1\node_modules\pkg2\node_modules"
1: "D:\temp\pkg1\node_modules"
2: "D:\temp\node_modules"
3: "D:\node_modules"
I tried to change the function tryFile in module.js as follow
// check if the file exists and is not a directory
function tryFile(requestPath) {
var fs = NativeModule.require('fs');
var stats = statPath(requestPath);
if (stats && !stats.isDirectory()) {
return requestPath; // <= Fix: use the symlink directly
//return fs.realpathSync(requestPath, Module._realpathCache);
}
return false;
}
And this was the extend of the change required to fix the issue
Instead of using fs.realpathSync()
for module require resolution, it ought be normalized with path.resolve()
which returns an absolute canonical path without resolving symlinks.
Conceptually it would better to have something like this patch against node-v5.5.0:
--- lib/module.js.orig 2016-01-24 12:25:44.000000000 -0500
+++ lib/module.js 2016-01-24 12:41:44.000000000 -0500
@@ -123,7 +123,7 @@
}
function toRealPath(requestPath) {
- return fs.realpathSync(requestPath, Module._realpathCache);
+ return path.resolve(requestPath);
}
// given a path check a the file exists with any of the set extensions
@@ -287,6 +287,7 @@
debug('Module._load REQUEST %s parent: %s', request, parent.id);
}
+ if (isMain) request = fs.realpathSync(request);
var filename = Module._resolveFilename(request, parent);
var cachedModule = Module._cache[filename];
The sole place where resolving symlinks is necessary is if the node start script happens to be a symlink - see isMain
above.
Of course lstat data would have to be cached through some other means since Module._realpathCache
would no longer be used. Patch above did not address that implementation detail.
@VanCoding
The only real problem that i see is when a module gets required multiple times through different symlinks it would not be cached by npm and get loaded multiple times.
Yes, caching is a real problem. Currently, you could just symbol link all deps to a global store because they share the same cache. Though npm doesn't do that but theroically someone could create a package manager use such method. See: https://nodejs.org/dist/latest-v5.x/docs/api/modules.html#modules_addenda_package_manager_tips
I created a PR for this issue here: https://github.com/nodejs/node/pull/5950
Passes all of Canary in the Gold Mine on linux and windows, things are looking pretty good as far as stability goes. W/R/T caching, modules still get cached, just according to their canonical symbolic path -- I'd argue that modules should never get cached according to their realpath because then we're just back to fighting the symbolic path abstractions that the filesystem gives us.
I think this issue can be closed out now.
Done!
Unfortunately, just like my previous comment metioned, it may break the cache behavior of alternative package managers. And I tested ied, pnpm, npminstall under Node V6 today. All those alternative package installers are rely on the original symbol link behavior and now they all lose cache in Node V6! That means if you install babel6 with these tools, you will suffer from performance issue same as babel6 in npm2. 😭
@jasnell I suggest revert this PR before we found the solution.
cc the authors of ied @alexanderGugel , pnpm @rstacruz, npminstall @fengmk2 and @dead-horse
https://github.com/nodejs/node/pull/5950/files#diff-d1234a869b3d648ebfcdce5a76747d71R119 It's broken all faster node package installers. I think module api is locked long ago, why we need to chagne this behavior?
@hax @fengmk2 Well, it's a new major version, so it's allowed (and expected) to introduce breaking changes. If you're relying on the old behavior, there are still the old versions, including LTS.
Also please do not forget, that the previous behavior was not documented in any way, so packages that now are broken made use of undocumented behavior, which you always have to expect to break.
I think me and others made very clear why this change was needed.
The good thing is: There should be ways to fix the now broken packages to work with Node 6.x again, since you can always get the real path from the virtual path. It was not possible the other way around.
It needs changing because resolving symlinks like this is wrong and broken.
Can someone please explain to me why package managers are having issues with this? It just sounds to me like they're using the same fragile behaviour of resolving symlinks when they shouldn't. If they were doing it right already I don't think they would have any issues... regardless of which way node is doing it.
Also please do not forget, that the previous behavior was not documented in any way, so packages that now are broken made use of undocumented behavior, which you always have to expect to break.
It is actually documented.
Since Node.js looks up the realpath of any modules it loads (that is, resolves symlinks), and then looks for their dependencies in the node_modules folders as described here, this situation is very simple to resolve with the following architecture:
Source: Addenda: Package Manager Tips
We're heavily relying on symlinks in ied. So do a lot of other projects. This change will result into hard to debug bugs and the benefit is IMO very limited. The previous behaviour is still well-documented and it's even described under Addenda: Package Manager Tips. If not on that, what can we rely on when writing a package manager?
cc @TheAlphaNerd
And module
API is Stability: 3 - Locked
https://nodejs.org/api/modules.html#modules_modules, it should be keep very stable.
If we lost the module cache, there is no way to make npm install
faster and efficient.
@alexanderGugel wow, you're right. I didn't know this exists.
But still, it's not documented clearly. Actually, it only states that it resolves symlinks when searching for a module. But It does not state that dirname and filename in the loaded module will also have the resolved path. It also does not state, that requires from such modules will originate from the resolved path.
So, basically, we could keep the documented behavior and still break your code ;)
@fengmk2 only documented behavior is locked, I think. Changes are still allowed, as long as it follows the spec.
But still, it's not documented clearly.
Since Node.js looks up the realpath of any modules it loads (that is, resolves symlinks), and then looks for their dependencies in the node_modules folders as described here, this situation is very simple to resolve with the following architecture:
I would argue it is extremely clear and breaks a locked API.
This change will result into hard to debug bugs and the benefit is IMO very limited.
In addition to being poorly documented the previous behavior of symlinked modules was counter-intuitive, ran counter to the common practise of UNIX symlinks and prevented developers from easily testing their modules without copying and installing.
@kzc That's a different question and the original reasoning behind the change. It does not, however, justify breaking a locked API that was mentioned under an agenda for package managers.
@alexanderGugel This new behavior is consistent with my interpretation of the documented module spec.
What breaks in these other package managers exactly? Perhaps we help can find a fix.
Currently, Node.js resolves symlinks when requiring and then uses the real location of the package/file as its filename and dirname instead of the symlinked one.
This is a problem because symlinked modules don't act the same as locally copied modules.
For example:
works, but
does not, because dep1 does not act like it's located in the node_modules directory of module and thus cannot find dep2.
This is especially a problem when one wants to symlink modules that have peer-dependencies.
Therefore, I suggest changing the behavior to no longer resolve symlinks. What do you think?