nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
106.6k stars 29.07k forks source link

Entry points specified as absolute URLs fail to load #46009

Open GeoffreyBooth opened 1 year ago

GeoffreyBooth commented 1 year ago

Version

19.3.0

Platform

Darwin Geoffreys-MacBook-Pro.local 22.2.0 Darwin Kernel Version 22.2.0: Fri Nov 11 02:08:47 PST 2022; root:xnu-8792.61.2~4/RELEASE_X86_64 x86_64

Subsystem

module, esm, process, cli

What steps will reproduce the bug?

With a checkout of the node repo at ~/Sites/node:

$ pwd
/Users/geoffrey
$ node file:///Users/geoffrey/Sites/node/test/fixtures/es-modules/mjs-file.mjs
node:internal/modules/cjs/loader:1042
  throw err;
  ^

Error: Cannot find module '/Users/geoffrey/file:/Users/geoffrey/Sites/node/test/fixtures/es-modules/mjs-file.mjs'
    at Module._resolveFilename (node:internal/modules/cjs/loader:1039:15)
    at Module._load (node:internal/modules/cjs/loader:885:27)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:82:12)
    at node:internal/main/run_main_module:23:47 {
  code: 'MODULE_NOT_FOUND',
  requireStack: []
}

Node.js v19.3.0

Note the /Users/geoffrey/file:/Users/geoffrey/Sites/node/test/fixtures/es-modules/mjs-file.mjs constructed specifier. This doesn’t make sense.

How often does it reproduce? Is there a required condition?

Every time.

What is the expected behavior?

Absolute file URLs should be allowable as program entry points.

What do you see instead?

Error: Cannot find module '/Users/geoffrey/file:/Users/geoffrey/Sites/node/test/fixtures/es-modules/mjs-file.mjs'

Additional information

I understand per https://nodejs.org/api/cli.html#program-entry-point the entry point is parsed by the CommonJS resolver, even when it passes the criteria for being loaded by the ESM one; but it doesn’t make sense that either resolver would be constructing an invalid path.

It’s also counterintuitive that file:///Users/geoffrey/Sites/node/test/fixtures/es-modules/mjs-file.mjs should be acceptable input for --loader and --import but not as the main entry point. For example, all of these are valid:

cc @aduh95 @JakobJingleheimer @nodejs/modules @nodejs/loaders

This came up because I was writing a test like this:

    const { code, signal, stdout, stderr } = await spawnPromisified(execPath, [
      '--loader',
      fixtures.fileURL('/es-module-loaders/loader.mjs'),
      fixtures.fileURL('/es-modules/mjs-file.mjs'),
    ]);

The solution was to change the second fixtures.fileURL to fixtures.path; but this feels wrong.

ljharb commented 1 year ago

If I'm reading this right, it seems like there's two distinct (altho related) items here:

  1. a bug, that the pwd is prepended to a URL, rather than the CJS resolver simply rejecting it
  2. a new feature, that the entry point be allowed to be a protocol'd URL?
aduh95 commented 1 year ago

The issue is that file:///Users/geoffrey/Sites/node/test/fixtures/es-modules/mjs-file.mjs can be parsed either as an absolute file: URL or a relative path. Since CLI application more often deal with path than URLs, it doesn’t strike me as bad design to prefer dealing with paths only (but maybe we should display a warning when the path doesn’t exist and can be parsed as a valid URL).

  1. a bug, that the pwd is prepended to a URL, rather than the CJS resolver simply rejecting it

It's not bug, it's a feature :) ./ is always prepended when something that's not an absolute path is provided so it never references a node_modules package.

$ mkdir -p 'node_modules/file:/dev'
$ echo 'console.log("works")' > 'node_modules/file:/dev/null.js'
$ node file:///dev/null
node:internal/modules/cjs/loader:1042
  throw err;
  ^

Error: Cannot find module '…/file:/dev/null'
    at Module._resolveFilename (node:internal/modules/cjs/loader:1039:15)
    at Module._load (node:internal/modules/cjs/loader:885:27)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:82:12)
    at node:internal/main/run_main_module:23:47 {
  code: 'MODULE_NOT_FOUND',
  requireStack: []
}

Node.js v19.3.0
$ node -e 'require("file:///dev/null")'
works
$ node --require file:///dev/null -e ';'
works
$ mv 'node_modules/file:' .
$ node file:///dev/null
works
$ node --require ./file:///dev/null -e ';'
works

2. a new feature, that the entry point be allowed to be a protocol'd URL?

Do we need a new feature or are we happy suggesting folks to do one of the following?

JakobJingleheimer commented 1 year ago

Hmm, yeah, I would expect anything provided via CLI to be a path, not a URL. I think we should handle it in SOME way, maybe even throw if the underlying utils can't handle a URL.

arcanis commented 1 year ago

Imo it should however be consistent with how --require / --import / --loader work (but more in the sense "these options should all accept absolute paths", not the other way around).

devsnek commented 1 year ago

the resolved path isn't invalid, you just don't happen to have a folder named file: in your cwd (which you could if you really wanted to)

aduh95 commented 1 year ago

Imo it should however be consistent with how --require / --import / --loader work (but more in the sense "these options should all accept absolute paths", not the other way around).

I disagree, it seems much more logical that --import behaves the same as import – i.e. no concept of paths, only relative and absolute URLs. Luckily UNIX paths can serve as relative URLs as long as all components of the path are composed of alphanumeric ASCII chars. Similarly, --require should support the same as require() – i.e. paths rather than URLs.

bmeck commented 1 year ago

It isn't about :// , // is only added to relative based urls (not blob / data / many protocols). : is perfectly valid in most file systems, and empty filenames are supported in various ones, mostly through flat file systems or remote file systems. : is more generally useful as it allows simple string based injection without needing disk access for those flags.

On Thu, Dec 29, 2022 at 12:16 PM Jordan Harband @.***> wrote:

@aduh95 https://github.com/aduh95 how can a relative path include "://"?

— Reply to this email directly, view it on GitHub https://github.com/nodejs/node/issues/46009#issuecomment-1367504608, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABZJIZUTPQUU6TIQGS7J4TWPXIRVANCNFSM6AAAAAATLZ5M3U . You are receiving this because you are on a team that was mentioned.Message ID: @.***>

GeoffreyBooth commented 1 year ago

the resolved path isn’t invalid, you just don’t happen to have a folder named file: in your cwd (which you could if you really wanted to)

Sure, but I couldn’t have a folder named file:// or file:///, could I? It collapses the /// into /, which I guess it’s always done, but does it have to? If not, that could be one way to disambiguate.

There’s a distinction between allowable potential filename/path and existing potential filename/path. The resolution algorithm tries a bunch of ways to interpret the input, returning on the first option that succeeds or erroring if they all fail. We could add an additional method or two to the list of ways that it tries, to try to load the entry point if the input is interpreted as a URL that maps to an existing file:

  1. After exhausting all current methods of trying to resolve the entry point, if the entry point begins with a known protocol like file://, data://, http://, or https://, parse it via fileURLToPath (for file:) or via the ESM resolution algorithm (for other supported protocols). Alternatively we could check for these protocols before trying to resolve via the CommonJS loader, which would technically be a breaking change for anyone with folders named file:, data:, http:, or https:.

  2. After exhausting all current methods of trying to resolve the entry point, run the input through the ESM resolution algorithm (like what’s used for --import) and see if any of its attempts find an existing file.

bnoordhuis commented 1 year ago

What you're proposing is, depending on one's POV, either DWIM or second-guessing the user.

I lean towards simplicity. Less room for surprise, confusion, and bugs - potentially of the security impacting kind.

GeoffreyBooth commented 1 year ago

@Hardik645 please ask on https://github.com/nodejs/help