nodejs / node-eps

Node.js Enhancement Proposals for discussion on future API additions/changes to Node core
443 stars 66 forks source link

002 - web compat of file extension/index searching #51

Closed bmeck closed 6 years ago

bmeck commented 7 years ago

In talks with @domenic a concern about using error behavior of a URL derived from step 2 of https://html.spec.whatwg.org/multipage/webappapis.html#resolve-a-module-specifier . In particular, the current plan has been to search for file extension such that:

import './foo';

Would search multiple file extensions .mjs, .js, .json, and .node.

Since ./, ../, and / do not immediately error upon resolution (they would error upon failed fetch) the question is if path searching could be limited to the error behavior for so called "bare" URLs within step 2.

domenic commented 7 years ago

In particular, I'm concerned about the gratuitous incompatibility that the file-extension and index.js searching introduces with browsers.

Stated another way: browsers do not have a solution for prefix-less imports yet (e.g. import "foo"). But, we will almost certainly add something in the future. The hope of doing so is that then browsers and Node.js can have 100% compatible semantics. But with the current proposal, that is impossible, because of the departure from browsers in prefix-ful imports (e.g. import "./foo").

If a Node.js author writes a file

import "foo";
import "./foo";

then the first line is not browser-compatible for now, but the second line will never be browser-compatible, because browsers will always interpret that as a literal request for ./foo, and will never issue 8 HTTP requests (4 for ./foo.extension, and 4 for ./foo/index.extension).

If the proposal instead restricted the path search magic to only prefix-less imports, then it will be much easier to guarantee compatibility across the ecosystems. Indeed, it means that any Node.js package which does not have external dependencies would work right out of the box in browsers on day 1.

Avoiding path-searching also has the minor benefit for Node.js authors of settling on a single convention (some of my projects are a mix of require("./foo") and require("./foo.js"), which is annoying). And it has the major benefit of avoiding several rounds of file I/O during program startup.

bmeck commented 7 years ago

My concern is that while it does settle on one convention it limits refactoring within a single project, but this is contained fairly well since very few (if any?) packages don't control the directory structure or names of their own files. I think it is acceptable as the main benefits of searching come from inter-package dependencies which still are preserved.

This does have a slight annoying affect of vendored modules outside of node_modules would no longer work the same, but people are already encouraged to use bundledDependencies for this purpose.

ljharb commented 7 years ago

Being able to avoid specifying the extension is critical - both in browsers and node - to ensuring easier refactoring. It's a common practice, for example, to use server rewrites to avoid including a ".php" or similar extension in all browser-side URLs, so as to not couple the browser code to the server's choice of PHP. Similarly, in node, omitting the extension means that I'm not coupling my import to my choice of .js, .mjs, .jsx, etc, which allows much more free refactoring.

The same is true with a browser requesting the URL /foo vs /foo/index.html - /foo allows for the common practice of a server rewrite to ensure that this hits foo.html or foo/index.html, and the server searches to find these.

In other words, "browser-compatible" in practice includes server configuration too, and "rewrite rules/searching" is something most servers already do.

Just to clarify, we're talking about whether the searching behavior that applies to relative paths (starting with . or /) should apply?

If we're talking about non-relative paths, these "prefixless imports" are the most common import/require path in node altogether, because they refer to npm-installed modules, so I'm not sure how you could write anything meaningful in node without the searching behavior.

If we're talking about relative paths, then we're severely limiting the ability to transparently move a relative directory in a project and extracting it out into an npm-installed module, because the way that directory is required/imported will drastically change upon moving it. This will have the effect of limiting people's ability to extract code into separate modules.

domenic commented 7 years ago

"rewrite rules/searching" is something most servers already do.

[citation needed]; this is not true in my experience (e.g. it is impossible on GitHub pages and other static hosting sites).

If we're talking about relative paths, then we're severely limiting the ability to transparently move a relative directory in a project and extracting it out into an npm-installed module, because the way that directory is required/imported will drastically change upon moving it.

It will literally require adding either ".mjs" or "/index.mjs". That is hardly a "drastic change" that is "critical" to ensure easier refactoring. It is certainly not worth such browser-incompatibility.

unional commented 7 years ago

Hosting services like apache and iis all provides rewriting rules. It is the responsibility of the server to handle this.

Mandating extension does have significant impact on development and tooling. Consider other languages that complies to JS: TypeScript, CoffeeScript, Dart, etc.

Using TypeScript for example, this could force the compiler to modify the reference from ./foo.ts to ./foo.js or ./foo.tsx to ./foo.jsx/./foo.js

Then consider running TypeScript code directly using ts-node or systemjs/jspm. The compiler need to disable such in these scenarios.

It is a mix of concern. It does seem like leaving the rewriting to the server is a better approach.

ljharb commented 7 years ago

Fair point, I have no evidence for "most" (although I've not used a framework that lacks it), and static hosting sites obviously lack this ability - but for static hosting sites, it's pretty common to have a build step beforehand (Github pages has this for the automatic generator, for example).

That is hardly a "drastic change"

It tightly couples consumers to the format you choose to use. Tight coupling is drastic, even if the fix it necessitates is minimal.

domenic commented 7 years ago

It is a mix of concern. It does seem like leaving the rewriting to the server is a better approach.

The issue is that the vast majority of people do not control their server configs. The story for creating modules that work across Node and the browser should not be: (a) install the npm package; (b) go talk to your server admin and ask them to add these n rewrite rules which the package has hopefully documented in its readme, but more likely you'll have to figure out by hand. (Also, never try to use the package on GitHub pages or other static hosting sites.)

It should just be (a).

domenic commented 7 years ago

but for static hosting sites, it's pretty common to have a build step beforehand (Github pages has this for the automatic generator, for example).

The whole point of ES modules in the browser, unlike CommonJS, is that there is no build step.

domenic commented 7 years ago

It tightly couples consumers to the format you choose to use. Tight coupling is drastic, even if the fix it necessitates is minimal.

I hope you recognize this is an opinion. We could just as easily say that tightly coupling yourself to Node's search-the-filesystem algorithm is "drastic"ally bad.

ljharb commented 7 years ago

"no build step" does not mesh with the way it's going to be used in practice. "minification" is a build step too, and I'd be surprised if there were many popular sites - static or otherwise - that did not minify their code.

I hope you recognize this is an opinion.

Indeed, it's an opinion - we're all stating opinions here, including that "no build step" is desirable, and that "node and the browser doing slightly different things" is problematic, and that such incompatibility is "gratuitous".

joshgav commented 7 years ago

Since ./, ../, and / do not immediately error upon resolution (they would error upon failed fetch)

They would error at resolution time if they can't be parsed as a relative URL to the module base path, same as absolute URLs. They would error on fetch if the script isn't actually found at the resolved path, also same as absolute URLs. Right?

If that's correct, it seems the question and decision of whether to perform additional path searches should apply the same to both absolute (algorithm step 1) and relative (algorithm step 3) URLs.

@bmeck what's our plan for non-URLs, like require('express')? The spec you referenced would make those invalid. Will Node continue to allow them? Perhaps we could treat them as though they were prepended with ./ and try base URLs from NODE_PATH, as in step 3 in the Node modules doc.

domenic commented 7 years ago

They would error at resolution time if they can't be parsed as a relative URL to the module base path, same as absolute URLs. They would error on fetch if the script isn't actually found at the resolved path, also same as absolute URLs. Right?

I can't think of any string that starts with those characters that will fail to parse as a relative URL.

If that's correct, it seems the question and decision of whether to perform additional path searches should apply the same to both absolute (algorithm step 1) and relative (algorithm step 3) URLs.

Agreed. And it seems bad IMO, if you do

import "file:///C:/foo";

to also search for file:///C/foo.js, file:///C/foo.mjs, file:///C/foo.json, etc. I.e. when given a URL, we should not construct a bunch of other URLs from it.

Similarly, specifying

import "./foo";

should also not construct a bunch of other URLs from it and search them.

bmeck commented 7 years ago

They would error at resolution time if they can't be parsed as a relative URL to the module base path, same as absolute URLs. They would error on fetch if the script isn't actually found at the resolved path, also same as absolute URLs. Right?

Correct, the behavioral change is when the error occurs. Without searching, at resolution time the fetch destination is set so initial fetch produces the error. With searching, the resolution cannot be finished until fetch has occurred.

If that's correct, it seems the question and decision of whether to perform additional path searches should apply the same to both absolute (algorithm step 1) and relative (algorithm step 3) URLs.

Unclear where these numbers are coming from.

@bmeck what's our plan for non-URLs, like require('express')? The spec you referenced would make those invalid. Will Node continue to allow them? Perhaps we could treat them as though they were prepended with ./ and try base URLs from NODE_PATH, as in step 3 in the Node modules doc.

We support those, this issue does not attempt to change behavior of "bare" URLs. We are only talking about step 2 of https://html.spec.whatwg.org/multipage/webappapis.html#resolve-a-module-specifier .

Our goal is to have an algorithms where both browsers and node can be act the same. The approach is to only differ from the browser spec in ways that would produce errors (either by occupying invalid resolution or errors during fetch).

bmeck commented 7 years ago

@domenic

I agree the behavior being different if a specifier parses as a URL and creating a relative/absolute from a specifier and base URL is bad. Current implementation however does not perform searching if given a fully specified URL that parses immediately. This follows the ongoing rewrite of the spec for node impl of ESM

domenic commented 7 years ago

Sure, all I'm saying is that (as per the OP of this issue) it should also not perform searching if given a relative URL. It's just a different argument than the OP, about consistency instead of browser-compat or performance. :)

bmeck commented 7 years ago

So this gets a bit interesting, as things that are fully specified URLs don't work with existing require arguments: require("file:///...") will always fail while require("./") wont fail. The old logic here was it seemed fine to not add the behavior to a new kind of specifier that never worked previously. Certainly more open to looking at this issue again and if we should add searching to fully specified URLs or if we should remove searching except for "bare" URLs.

joshgav commented 7 years ago

@bmeck

Current implementation however does not perform searching if given a fully specified URL that parses immediately

Got it, thanks! What would be the reason to treat absolute and relative paths differently then? Is it because using absolute URLs isn't common in Node today so we're less concerned about breaking expectations?

IMO consistency between relative and absolute paths seems like it would be important in the long run.

these numbers are coming from

the steps in https://html.spec.whatwg.org/multipage/webappapis.html#resolve-a-module-specifier.

bmeck commented 7 years ago

@joshgav It seems there is some confusion on absolute paths vs fully specified URL specifiers. We are talking about 2 different things:

  1. fully specified URL specifiers ala file:///C:/windows
  2. absolute, or relative specifiers ala C:/windows, ./windows, ../windows

If you look at how URL parsing works in the whatwg spec you will note that absolute paths don't parse the same as fully specified URLs with protocols.

Under current text:

jkrems commented 7 years ago

Is there value in switching to file://-URLs for ES6 modules only? Or would we be concerned about confusing Windows users (because slashes etc.)?

bmeck commented 7 years ago

@jkrems not sure what you mean. ESM import plan is to always use URLs for importing. Don't really have a plan for require() supporting URLs as I am actively avoiding changing any behavior of require(). Might be a diff issue to add support for that just like how fs got URL support but not for strings, only URL Objects.

jkrems commented 7 years ago

@bmeck Nevermind me, then. Must've confused proposals or something. :)

Don't really have a plan for require() supporting URLs as I am actively avoiding changing any behavior of require().

👍 Definitely didn't mean to imply that.

joshgav commented 7 years ago

@bmeck

We don't search for fully specified URLs. We always search for absolute paths, and relative paths.

Exactly, there's an argument that those should be consistent.

Q: How would an absolute path like 'C:/windows' work since it doesn't parse as a URL and doesn't begin with /|./|../? Do we prepend file:///?

An idea: Distinguish between file and http schemes? That is, for file scheme do the searches, for http and other network protocols don't.

bmeck commented 7 years ago

Exactly, there's an argument that those should be consistent.

Sounds fine, always searching or never searching both seem ok.

Q: How would an absolute path like 'C:/windows' work since it doesn't parse as a URL and doesn't begin with /|./|../? Do we prepend file:///?

I think the topic of drives as URLs is discussed in https://blogs.msdn.microsoft.com/ie/2006/09/13/createurlmoniker-considered-harmful/

The algorithm in place does resolution by resolving against the import location's URL via the WHATWG URL resolution mechanics. An example is:

const import_specifier = '/d:/foo'; // edited: forgot the prefixing /
const import_location = new URL('file:///c:/app/server.js');
console.log(new URL(import_specifier, import_location));
// d:/foo

Feel free to test this out in a browser, the result is the starting point if we want to do searching, or the absolute path if we don't.

I am opposed to differing behavior in protocol/scheme if browsers do not. It would just add more things to learn. Current plan is to only support file: at least until security of networking is looked into.

jkrems commented 7 years ago

My understanding from the above is that absolute windows paths would have to have a leading slash?

> new URL('d:/zapp', 'file:///c:/foo/bar').toString() // not supported
< "d:/zapp"
> new URL('/d:/zapp', 'file:///c:/foo/bar').toString() // supported
< "file:///d:/zapp"
bmeck commented 7 years ago

@jkrems yes, just like in browsers

bmeck commented 7 years ago

@jkrems ah, to note / always means relative to current drive letter of process working dir, so often you wouldn't need to specify the drive.

domenic commented 7 years ago

That may be a spec bug actually... Chrome and Edge translate new URL("D:/zap") to a file URL (not a URL with scheme d). I'll file a spec bug.

jkrems commented 7 years ago

@domenic The above was in Safari for what it's worth. But I just tried the same in Chrome 57 and got the same results. Didn't seem to treat drive letters differently. I'd hope that's not magic only enabled on Windows (tested on Sierra).

@bmeck Neither Safari nor Chrome seem to treat drive letters any different than "normal" path segments. Is that expected?

> new URL('/zapp', 'file:///c:/foo/bar').toString()
< "file:///zapp"

EDIT: Firefox on Mac acts like Chrome/Safari on Mac.

domenic commented 7 years ago

Unfortunately behavior varies between platforms in Chrome. That is a Chrome bug.

Safari Tech Preview is more spec-compliant and probably treats drive letters correctly.

bmeck commented 7 years ago

@jkrems expected. without a drive letter it resolves to just is a preceding /

jkrems commented 7 years ago

Sorry, when I said "safari" I meant "latest safari technology preview".

domenic commented 7 years ago

Oh. Well, per spec, and in Chrome Windows and Edge, the result is file:///C:/zapp. This should also be the case in Node.js's require("url").URL parser.

jkrems commented 7 years ago
> nvm run 8
Running node v8.0.0-nightly20170310b806e18dd7 (npm v4.1.2)
> const {URL} = require('url')
undefined
# Ah, yes. Resolves relative to drive letter.
> new URL('/zapp', 'file:///c:/foo/bar').toString()
'file:///c:/zapp'
# Nope, drops the protocol
> new URL('d:/zapp', 'file:///c:/foo/bar').toString()
'd:/zapp'
# Yep, doesn't work anymore (compared to my browser tests):
> new URL('/d:/zapp', 'file:///c:/foo/bar').toString()
'file:///c:/d:/zapp'
# double-slash for protocol relative absolute url
> new URL('//d:/zapp', 'file:///c:/foo/bar').toString()
'file:///d:/zapp'
# Also works with triple slash which I guess is technically a bit more correct
> new URL('///d:/zapp', 'file:///c:/foo/bar').toString()
'file:///d:/zapp'

Got it.

bmeck commented 7 years ago
> # Nope, drops the protocol
> new URL('d:/zapp', 'file:///c:/foo/bar').toString()
> 'd:/zapp'

d: is the protocol

But it does sound like we need to fix some URL impl stuff

jkrems commented 7 years ago

Yeah, according to @domenic above it (maybe?) should parse as file:///d:/zapp in node as well. Depending on whether this is deemed a spec- or a chrome/firefox/edge-on-windows bug.

# More direct example, without relative resolution:
> new URL('d:/zapp').toString()
'd:/zapp'
Trott commented 7 years ago

@bmeck Is it OK to remove the ctc-agenda label from this and we can put it back when the CTC needs to discuss or make a decision on something? (Or do you expect that a need for discussion or decision is likely this upcoming week?)

bmeck commented 7 years ago

@Trott we can remove it for now

annevk commented 7 years ago

So judging from https://github.com/whatwg/url/issues/271 I'm inclined to not change the URL Standard for this issue since it would prevent single-code-point URL schemes. I'd rather have Windows Chrome and Edge change.

Jamesernator commented 7 years ago

I'd like to point out something important that I don't think has been mentioned. Assuming .mjs happens then importing from an extension-less file is not at all equivalent to send out a bunch of HTTP requests because only .mjs is actually considered as an ES6 module, everything else is CommonJS.

The consequence of this is that if browsers were to implement extension-less imports then they'll have to also implement the CommonJS algorithm to import any .js/.node/etc files to have the same semantics as Node.js anyway so sending a bunch of requests wouldn't even solve the issue.

Just an example of what I was saying:

// foo.js
// Need not care if bar is a CommonJS module or an ES6 module
// but because of that it'll only work in Node.js
import bar from "./bar"

// bar could be one of these:
// bar.mjs
export default "Banana"
// or
// bar.js, not web compatible regardless of whether or not you
// send another HTTP request for it as it's not even an ES6 module
module.exports = "Banana"

Now this isn't to say I think import should allow the extension-less form, but at the very least require will need to support .mjs extension searching.

I do see there being an issue with module authors distributing Node.js modules as extensionless (even when they didn't need Node.js to resolve whether a file was CommonJS or a Module) as that immediately means all consumers of the module have to adopt a build process to use it browsers (I'm not counting extensionless in JSX/TypeScript/etc as those should be rewriting imports when they generate out files anyway).

If node.js does support extensionless imports what'd be really nice is if there was a field in the package.json that can be used to say files that are browser compatible (recursively so if index.js is browser compatible all subfiles must be too) that prevents publishing if those files could not be imported by a browser.

As an example of the package.json thing:

// package.json
{
    // obviously "browser" wouldn't be used as it's already used
    // by other tools, but something similar
    "browser": "my-cool-package.mjs" // string or list of strings
}

// my-cool-package.mjs
import foo from "./foo" // Not browser compatible
import bar from "banana" // Not browser compatible (for now)
import fizzbuzz from "fizzbuzz/foo" // Not browser compatible

// When running npm publish an error would be thrown
Can't publish my-cool-package as files declared browser compatible aren't
browser compatible

// Probably an additional option to npm install to ensure a package will be
// browser compatible which only succeeds if it has a browser field
// that successfully published
npm install --browser my-cool-package
bmeck commented 7 years ago

The topic of compatibility is not about equivalence, but problems for the two systems. I do not think browsers have any desire to care about extensions (you can serve extensions like "yolo.php"). Can you clarify what problems Node causes for the plans of browsers without aiming towards equivalence.

On Jul 11, 2017 10:25 PM, "James Browning" notifications@github.com wrote:

I'd like to point out something important that I don't think has been mentioned. Assuming .mjs happens then importing from an extension-less file is not at all equivalent to send out a bunch of HTTP requests because only .mjs is actually considered as an ES6 module, everything else is CommonJS.

The consequence of this is that if browsers were to implement extension-less imports then they'll have to also implement the CommonJS algorithm to import any .js/.node/etc files to have the same semantics as Node.js anyway so sending a bunch of requests wouldn't even solve the issue.

Just an example of what I was saying:

// foo.js// Need not care if bar is a CommonJS module or an ES6 module// but because of that it'll only work in Node.jsimport bar from "./bar" // bar could be one of these:// bar.mjsexport default "Banana"// or// bar.js, not web compatible regardless of whether or not you// send another HTTP request for it as it's not even an ES6 modulemodule.exports = "Banana"

Now this isn't to say I think import should allow the extension-less form, but at the very least require will need to support .mjs extension searching.

I do see there being an issue with module authors distributing Node.js modules as extensionless (even when they didn't need Node.js to resolve whether a file was CommonJS or a Module) as that immediately means all consumers of the module have to adopt a build process to use it browsers (I'm not counting extensionless in JSX/TypeScript/etc as those should be rewriting imports when they generate out files anyway).

If node.js does support extensionless imports what'd be really nice is if there was a field in the package.json that can be used to say files that are browser compatible (recursively so if index.js is browser compatible all subfiles must be too) that prevents publishing if those files could not be imported by a browser.

As an example of the package.json thing:

// package.json { // obviously "browser" wouldn't be used as it's already used // by other tools, but something similar "browser": "my-cool-package.mjs" // string or list of strings } // my-cool-package.mjsimport foo from "./foo" // Not browser compatibleimport bar from "banana" // Not browser compatible (for now)import fizzbuzz from "fizzbuzz/foo" // Not browser compatible // When running npm publish an error would be thrown Can't publish my-cool-package as files declared browser compatible aren't browser compatible // Probably an additional option to npm install to ensure a package will be// browser compatible which only succeeds if it has a browser field// that successfully published npm install --browser my-cool-package

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nodejs/node-eps/issues/51#issuecomment-314634172, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOUo-eDpQT-NqlKgEoWLye1KBQPwRemks5sNDy7gaJpZM4MSYue .

Jamesernator commented 7 years ago

@bmeck I'm referring to domenic's point.

What I was saying is that I don't think it's particularly an issue from a design standpoint for Node.js to do searching in import specifiers assuming .mjs is going to be the format of modules in Node.js as it definitely is better for refactoring).

However I also don't want to see the fact that extension-less specifiers being used when all code is just ES6 modules as that might encourage (or at least enable) package authors to accidentally lock their packages to some specific build tool (e.g. imagine the scenario where a package author always uses X build tool and publishes with extensionless specifiers, they might not even be aware that their package can't be consumed by itself).

This in my opinion is very bad as it means I as a module consumer (in npm) might need to specifically add a build tool just to convert the package author's import foo from "./foo" into import foo from "./foo.mjs". This is why I suggested that instead of disallowing extensionless specifiers to instead give a way for package authors to guarantee that their package is web-compatible by having a field which says, these files must (recursively) only contain web-compatible import specifiers.

Of course instead of having package-authors having to manually add such fields it could even be the case that npmjs.com displays whether or not a given package is also web-compatible, which would be really really nice when manually finding a package for some purpose.

Obviously Node.js could just do nothing, allow extensionless module specifiers and leave it up to developers to ensure their packages are web-compatible, but I think it'd be nice for Node.js to at least encourage package authors to write full specifiers if they intend their package to used on the web.

ljharb commented 7 years ago

How could a package author provide a full specifier, given that a full specifier is a very app-specific URL?

You always need a build tool - and any module bundler and transpiler combo would be able to handle extensions for you, and deliver the desired format to your supported browsers.

Jamesernator commented 7 years ago

I didn't mean full as in absolute, just as in including extension. A package consumer will always be able to do something like: import lodash from "./node_modules/lodash-es/index.js" (not that lodash using .js will be supported in Node.js anyway) so it's not a concern of package authors, just of consumers.

And I definitely don't think I should require a build tool, like if I just want to make a small application with a couple libraries, I should not have to have a build tool. And what about CDNs, do I need a build tool service worker to compile every random unspecified format that isn't just plain browser modules.

Frankly I think if modules are going to require build tools just to use them then I suspect module adoption will be slowed. I certainly would be much less likely to use a module that required a build tool to simply use than one that just worked out of the box in both Node.js and the browser.

Also note I'm not suggesting that libraries shouldn't have a build tool to generate those files, but I think it would be best for consumers if it were the standard that library files that were going to be consumed contain the extensions. Now whether or not Node.js needs to play a role in that or if it'll happen on its own I don't really know.

Omitting extensions really reminds me a lot of depending on require.extensions to load modules:

// someFile.js
const coffee = require('coffee-script')

require.extensions['coffee'] = function(code) {
    return coffee.compile(code)
}

module.exports = require('./mainFile.coffee')

Which frankly is not the sort've thing I want to be forced onto consumers of modules.

ljharb commented 6 years ago

If it's not an absolute or relative URL, you'll always require a build tool since that's the only thing browsers understand - adding an extension or not is irrelevant.

Modern web dev requires build tools; including most of npm - it hasn't slowed adoption.

require.extensions.mjs is going to exist anyways in node; that's how you'll be able to require an .mjs module - and every module tool in the ecosystem will have to add that support regardless, so it'll appear for free on any module consumers who are already using a module tool (which is basically all of them).