nodejs / node-eps

Node.js Enhancement Proposals for discussion on future API additions/changes to Node core
442 stars 66 forks source link

WRT PR #3 - ES module detection #13

Closed bmeck closed 7 years ago

bmeck commented 8 years ago

This is the place to discuss specifics of how node will determine the mode a given a source (file / source string). Source string examples include stdin or -e, they do not include JS Function or eval calls. The proposal itself can be seen in PR #3.

Discussions here should regard:

Note: CJS will be the default target of node for the foreseeable future, "normal" here means what developers write daily. This is a constraint that can be easily worked around for a single source via a cli flag to note that the source is an ES module. This flag should not be seen as relevant to the discussion in this thread.

Note: We are only discussing choices that do not require parsing of the source file.

It should not discuss import/export conversion, evaluation ordering, or module path resolution.

zenparsing commented 8 years ago

my earlier suggestion means .jsm can be used without deprecating the .js extension

Who wants to type in .jsm.js or even .m.js for all of their files? I certainly don't, and it amounts to roughly as much of a developer burden as a per-file pragma.

balupton commented 8 years ago

Who wants to type in .jsm.js or even .m.js for all of their files? I certainly don't, and it amounts to roughly as much of a developer burden as a per-file pragma.

Then just use file.jsm instead of file.jsm.js, that too is supported by the .jsm decorator proposal.

However, if the developer's environment, tooling, server, whatever doesn't yet support file.jsm directly, then typing file.jsm.js would be the less costly solution for them, the one they want to do.

It would be the one I want to do, as it would work with all my existing tooling without any changes to them.

zenparsing commented 8 years ago

If you support both .jsm.js and .js, then in addition to the (rather large) problems that each faces individually, you also add the burden of useless choice. The "either-or" in this case is therefore actually worse.

It would be the one I want to do, as it would work with all my existing tooling without any changes to them.

I'll make the stronger claim that only a minuscule amount developers will want to use such a long extension.

balupton commented 8 years ago

If you support both .jsm.js and .js

Do you mean "If you support both .jsm.js and .jsm" ?

I'll make the stronger claim that only a minuscule amount developers will want to use such a long extension.

Aesthetics choices are only possible if they are available. If their situation doesn't allow file.jsm they have no choice but to file.jsm.js.

ljharb commented 8 years ago

@balupton using .jsm.js would not work as cleanly with require.extensions, nor with Rails' Sprockets, both of which only look at the last extension.

domenic commented 8 years ago

.m.js seems better than .jsm.js.

dead-claudia commented 8 years ago

@domenic I was thinking the same, but I was a little hesitant to mention anything with multiple extensions because there seemed to be some practical objections with that in the first place. (Not that tools couldn't be fixed, but that's another deal.)

balupton commented 8 years ago

@balupton using .jsm.js would not work as cleanly with require.extensions, nor with Rails' Sprockets, both of which only look at the last extension.

These sound like much easier solvable problems than the alternatives.

bmeck commented 8 years ago

@balupton @isiahmeadows @domenic my fear is that this means any tool that only parses the last .[^.]+ characters to grab the file extension will need patching, I am unsure how many servers are currently doing this and if this would affect any of the major ones. I am ok with patching node to match this, but this is not a common case for how file extensions are declared and also may have heavy ecosystem impact.

dead-claudia commented 8 years ago

@bmeck I get that. To be perfectly honest, I'm more of a bystander in this discussion, anyways. As long as I'm not having to add a "use this new, horribly-complicated-to-implement module syntax"; at the top of every single file using ES6 modules, I'm good.

(By the way, I do like ES6 modules. Don't get me wrong. :wink:)

ChALkeR commented 8 years ago

@bmeck So what about the exact extension? What alternatives do we have atm? I remember someone also mentioned .es, for example.

benmosher commented 8 years ago

I'd like to offer a "why not both?" solution (see jokeyrhyme's last comment).

Analogously to the existing package.json main key (which defaults to index.js when a folder is require'd but has no package.json, or existing package has no defined main key), I propose that whatever glob key would be defined would have a default value that is essentially **/*.[some module extension].

This means that a package author may define a glob, if needed or desired, but may also use modules without defining a custom glob by following the default glob, which is essentially some chosen file extension.

Having read through this thread, I hope I didn't miss any major technical shortcomings or disqualifications. Here goes, as JS pseudocode, for clarity and specificity:

const path = require('path')

// assuming these exist, are performant, cached as appropriate, etc.
import { findPackageForPath, loadPackage } from 'core'
import { satisfiesGlob } from 'glob-utils'

// not critical what these are ultimately defined to be
import { MODULE_GLOB_KEY, DEFAULT_MODULE_EXT } from 'bikeshed'

const DEFAULT_GLOB = `**/*.${DEFAULT_MODULE_EXT}`

/**
 * Determine if a file at a given path is an ES module, and should
 * be parsed as such, without inspecting the file's contents.
 * @param  {string}  absolutePath - full resolved path to file (i.e. /my/app/path/cool-module/index.js)
 * @return {Boolean} - true iff file at path should be parsed + treated as module
 */
export function isPathModule(absolutePath) {
  let packagePath = findPackageForPath(absolutePath)
    , packageObject

  if (packagePath != null) {
    packageObject = loadPackage(packagePath)
  } else {
    // implicit package at folder (a la index.js)
    packagePath = path.dirname(absolutePath)
  }

  let glob

  if (packageObject != null && MODULE_GLOB_KEY in packageObject) {
    // use package-defined glob, when available
    glob = packageObject[MODULE_GLOB_KEY]
  } else {
    // use default
    glob = DEFAULT_GLOB
  }

  return satisfiesGlob(absolutePath, glob, {
    relativeTo: packagePath,
  })
}

In cases where infrastructure cannot parse JSON, application authors simply need to respect the chosen default convention. In cases where a new file extension would cause immediate burden, a custom glob may be defined.

I think this covers the concerns raised in this thread without being too complicated or potentially expensive at runtime, assuming the package locations/objects and glob definitions can be cached per-directory during process startup and module load.

Again, apologies if I missed anything while reading through the thread that would disqualify any or all of this. :sweat_smile:

Fishrock123 commented 8 years ago

In case it didn't propagate to here, at https://github.com/nodejs/node/issues/5866 most of the CTC preferred a module extension (discussed was .jsm).

ChALkeR commented 8 years ago

@Fishrock123 I don't think that we settled on the specific extension, though.

bmeck commented 8 years ago

correct, tc39 met last week and I was holding any discussion until I get the meeting notes.

Fishrock123 commented 8 years ago

Correct, we didn't settle on an extension, but we did discuss the .m.js thing two or so weeks ago and preferred .jsm to that iirc.

ChALkeR commented 8 years ago

*.jsm has the downside that it's already used for JS modules that are not ES modules: https://developer.mozilla.org/en-US/docs/Mozilla/JavaScript_code_modules.

Those are also present in npm packages.

On the other hand, *.es is audio/echospeech, but it's very old, unregistered, and is probably not present anywhere (or is it?).

bmeck commented 8 years ago

@ChALkeR looks like most of the .jsm are coming from source-map or specific dist for firefox addons.

$ grep source-map < jsm.txt | xargs basename | sort | uniq
SourceMap.jsm
Utils.jsm
prefix-source-map.jsm
prefix-utils.jsm
suffix-source-map.jsm
suffix-utils.jsm
$ grep -v source-map < jsm.txt | wc -l
      83

So looks like ~89 are in npm, and those source-map files are not in the current source-map repo

dead-claudia commented 8 years ago

@ChALkeR .es appears already registered with IANA as ECMAScript (application/ecmascript), as of April 2006.

ChALkeR commented 8 years ago

@isiahmeadows Wow. Thanks, I missed that!

jokeyrhyme commented 8 years ago

Alternatively, .esm for ECMAScript Module, if there are backwards-compatibility concerns with other suggestions. /shrug I doubt we'd have much confusion or compatibility issues within our own community regarding Elder Scrolls files that also use the .esm extension. :P

ljharb commented 8 years ago

.m.js is a nonstarter imo for reasons already discussed, but .es, .esm, .jsm, are all fine.

dead-claudia commented 8 years ago

@ChALkeR Welcome :smile:

rubennorte commented 8 years ago

I think we should go for the .jsm extension. It's very simple and clear and I don't see the problem about the existing tooling detecting .js extension files. If they won't support the .jsm extension they probably won't parse the file correctly either way.

bmeck commented 8 years ago

.jsm apparently can have access to sensitive information within Firefox if loaded via a file:// url, taking it out of the running / .mjs can be proposed instead and keeps js.

Due to the prolific use of JS in project names keeping js leans me away from .es culturally: DailyJS, Node.js, Cylon.js, Three.js, DynJS, VorlonJS, etc. That said a new trend of referring to ES, ESNextNews, ES6, ES2015 makes this less of a concern for me.

RegExp and Glob lean slightly towards .es: \.[ej]s works both with basic globbing and regular expressions as a single pattern; .mjs cannot be easily represented in globbing alone since there is no optional operator for the m.

I do have concerns as well if .es is chosen as people will snatch up .es TLDs. Which may have some impact, but I don't know if that should even be considered when talking about this.

Updating EPS to be .mjs and re-submitting that it become a draft.

jmm commented 8 years ago

For globbing *.{m,}js seems to work for me in Bash. Not sure how universal it is though.

caridy commented 8 years ago

wow, we are getting down the rabbit hole with the extension proposal.

btw, our timeline to share the counter proposal should be ready early next week.

bmeck commented 8 years ago

@caridy rabbit hole?

bmeck commented 8 years ago

In the event of a counter proposal, we will still need to verify any assumptions / impact it may have even outside of node itself. Such scrutiny will be applied to any direction taken. Support for other tooling / environmental implications / developer experience / potential exploitation are not to be considered a rabbit hole.

Fishrock123 commented 8 years ago

I don't really see why something locally in a web browser should effect us, where will that confirm-ably overlap?

dead-claudia commented 8 years ago

Once <script type="module"> takes off, people may start using Node idioms on both ends to differentiate traditional scripts from newer ES6 modules, which could become relevant for servers.

On Fri, Apr 15, 2016, 13:58 Jeremiah Senkpiel notifications@github.com wrote:

I don't really see why something locally in a web browser should effect us, where will that confirm-ably overlap?

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/nodejs/node-eps/issues/13#issuecomment-210565507

bmeck commented 8 years ago

@Fishrock123 what @isiahmeadows said and also the fact that some test runners do in-browser testing generally over the file:// protocol: https://mochajs.org/#running-mocha-in-the-browser

zenparsing commented 8 years ago

+1 to renaming the language MichaelJacksonScript.

dead-claudia commented 8 years ago

@bmeck I'm not sure how relevant file:// vs http(s):// is. IIRC the protocol used wouldn't make any difference in how it's interpreted. Could you elaborate?

bmeck commented 8 years ago

@isiahmeadows please see the note in the EP ( https://github.com/bmeck/node-eps/blob/es6-module/002-es6-modules.md#reason-for-decision ), Firefox loads .jsm files differently when you double click them, open them via cli, etc.

Jeff-Lewis commented 8 years ago

So for applications & packages written in ES+Babel now with .js extension, what is their expected upgrade path (dev & CI environments) to use native ES6 modules in node?

Also, for applications & packages using CJS modules in node 4+ with ES classes and where majority of the code is already using existing ES features in node 4+, what does their upgrade path look like?

jokeyrhyme commented 8 years ago

@Jeff-Lewis

So for applications & packages written in ES+Babel now with .js extension, what is their expected upgrade path (dev & CI environments) to use native ES6 modules in node?

  1. rename all module files from ".js" to ".mjs"
  2. change all relative import paths so that they specify extension e.g. import './blah.mjs';
  3. change all package import paths so that they are package name only
    • import 'package'; is allowed
    • import 'package/relative/path.mjs'; is not allowed
    • this may mean requesting upstream package maintainers to properly export inner values of use, or to publish separate packages
  4. remove Babel and cross fingers that everything still works
  5. you may leave any existing require(); statements, as they'll continue to work as before

Also, for applications & packages using CJS modules in node 4+ with ES classes and where majority of the code is already using existing ES features in node 4+, what does their upgrade path look like?

CJS modules that work in Node 4 and use require() will keep on executing just fine, just like they have all through Node 5. There is no forced migration for existing CJS code. An optional migration path might be similar to the above steps with the addition of:

  1. replace module.exports = { /* ... */ }; with export default { /* ... */ };
  2. replace require('blah'); with import 'blah';
  3. replace const blah = require('./blah'); with import blah from './blah.mjs'
dead-claudia commented 8 years ago

@jokeyrhyme

1) rename all module files from ".js" to ".mjs" 2) change all relative import paths so that they specify extension e.g. import './blah.mjs';

Only if an explicit .js extension was already provided.

3) change all package import paths so that they are package name only

  • import 'package'; is allowed
  • import 'package/relative/path.mjs'; is not allowed

That is allowed. package/relative/path will resolve regardless to package/relative/path.js, etc. now, and changing that would break a lot more than even the left-pad debacle. I take advantage of that myself.

... There is no forced migration for existing CJS code. ...

It'll happen, though. ES6 modules are far superior, IMHO.

... 3) replace const blah = require('./blah'); with import blah from './blah.mjs'

You can still use import blah from './blah'. Although, you might have to use import * as blah from './blah' instead, if the exports are named, and not default.

jokeyrhyme commented 8 years ago

@isiahmeadows @Jeff-Lewis Oops? I was under the assumption that the module resolution folks were voting in favour of making Node.js strictly match browser behaviour, with no resolution magic

Probably best following up those comments in the other thread, as it might be beyond the scope of "module detection" to discuss that any further here.

dead-claudia commented 8 years ago

@jokeyrhyme I'll take that back, but only partially. That's still being discussed, and one of the biggest concerns about requiring explicit extensions and paths is that people are already depending on Node's algorithm with Babel when targeting either platform.

mikeal commented 8 years ago

@bmeck heya, you should submit a talk about all this for Interactive :) https://www.conferenceabstracts.com/cfp2/login.asp?EventKey=VSIHQOEU

jokeyrhyme commented 8 years ago

FYI: https://github.com/dherman/defense-of-dot-js

rektide commented 8 years ago

This thread seems entirely devoted to how Node wants to carve a special case people need to opt in to to use ES6 Modules. Using an extension other than .js is hideous and not JavaScripty, and Node should bend over backwards to make itself maximally JavaScripty and not a special case demanding X Y or Z. As a developer in 2018, I will be writing JavaScript with JavaScript modules and those modules- used by front and and back end- WILL be .js files. If Node wants to make it hard to run actual JavaScript on Node, it should proceed with the current idea of requiring a special extension (.jsm): Node can go off and do that and then anyone who wants to use normal, regular code will have to "transpile" (rename the file) my actual JavaScript to Node JavaScript. But I suggest you avoid this obvious terrible horrific looking jank of carving out your own extension.

My suggested heuristic, in loose form:

  1. assume es6 m and parse/load/w/e. provide a global require() but no module. if there are no exports or execution errors, proceed to 2.
  2. assume commonjs and parse/load/w/e.

This favors a happy path, and falls back to the current set of assumption. Node should try to just do right. There are very simple heuristics to determine when the attempt to use actual modules (es6m) has failed, and then fallback and retry.

Using an extension other than .js seems like it will break, look jank, and be hideous, in the dual-sided world we're in. I haven't seen any solid arguments as to why explicit coaching is required from node. We should milk a source detect solution as best as we can, and only fallback to explicit coaching of node. And if node is going to be so crippled as to need explicit help, it should at least kindly read it out of the meta-data payload node already relies on, package.json, rather than stomp the normal signalling people use to indicate they're writing js.

as a developer, i would way rather write that looks good and normal than mutilate normal looking filenames for node's sake. i'd rather write packages broken for node that look normative and let consumers babel transpile my .js to your funky weird .jsm than distribute bizarre files. if node wants to not be JavaScript, it invites in the need to build chain JavaScript to run and consume npm packages.

I've mined the thread for what I can find about source-detect, and this is what I've rounded up:

I understand that there are a lot of objections to the idea of trying to inspect the source to detect the format, to the point that in the original thread people keep saying it's a dead issue, off the table. @jmm http://github.com/nodejs/node-eps/issues/13#issuecomment-194984818

Do we need to know which mode we are in prior to parsing? Which mode-sensitive use cases do not ultimately involve parsing? @jokeyrhyme https://github.com/nodejs/node-eps/issues/13#issuecomment-195085103

bmeck commented 8 years ago

@rektide please refer to notes in https://github.com/nodejs/node/wiki/ES6-Module-Detection-in-Node , and other blog posts around the web like https://www.nczonline.net/blog/2016/04/es6-module-loading-more-complicated-than-you-think/ on why knowledge prior to parse/evaluation is required.

rektide commented 8 years ago

@bmeck the FAQ addresses content detection, so thanks for the pointer to that. It proposes cons, but I'm not sure why I should care about any of them, given it's pro, which is "js gets to stay together": Least visible impact on ecosystem, does not affect package.json or scripts using *.js. Here's the con's that I'd like to see defended into strong believability, because they sound hard but worth doing to me:

  1. Hard to determine file mode for large files if they barely use a feature that causes a mode switch. (cause of rejection)
  2. Implementers unlikely to implement a parsing API that auto-detects the mode (this is a very strong reason)
  3. Implementing this is non-trivial
  4. Likely hurt the performance of importing
  5. Ambiguity causes potential implicit semantic changes (cause of rejection)

None of these seem like things that will effect most consumers anywhere near bad enough to justify breaking up JS.

caridy commented 8 years ago

@rektide we went from "sloppy mode" to "opt-in strict mode" in ES5, then into "implicit strict mode" in ES6 modules. The main driving force was to eliminate some of the "bad parts", where some of the "worst parts" were not even trivial for consumers of the language, but a major hurtle for JIT implementers. I don't do JIT for living, but those who do, they seem to be reluctant to the idea of a unified "smart" parser. Feel free to read the TC39 notes about this subjects :).

rektide commented 8 years ago

The key "against" in More complicated than you think is having to reparse if things go bad, but so what? You're using old legacy modules in that case. Letting the entire .JS ship sink because the legacy fallback case is going to be slow seems odd.

The edge case mentioned, that a module might not have exports, is also handled by doing the ES6M attempt first: it works.

@caridy my optimistic heuristic doesn't need a smart parser. It tries the narrower better future-facing (strict/modules) case first, which will also the clear case to detect failure against, since it's stricter. If future microoptimizations can get us "smart" parsing so be, sweet, we can coupe some.

But the main reconciliation, to get yourself back to being fast, is to just be ES6m. Which is easy enough to do via babel, if you do have massive honking huge module files that for some reason you really care about the load time for (edgiest of edge cases). If all the input is ES6m, no performance is lost.

caridy commented 8 years ago

@rektide about your heuristic proposal, that's not the way parsers work I'm afraid :), remember that node doesn't have a parser, it uses v8 parser, two of them to be precise. Anyhow, we have discussed this problem extensibly in #3, and via other channels. Feel free to reach out to implementers, and get the ball rolling if you feel that you can get any traction with them to provide a unified parser, then find a champion and bring the proposal to TC39 :)

The two proposals that we are considering at the moment are based on what we have today (two parsers: Script Parser and Module Parser).

As for the penalization process that you mention abode as "not a bit deal", it is a big deal, nobody will ever take a bullet that will make their env slower than it was before just for the sake of simplicity. If you have to assume that all files are ES6 and fallback (somehow "magically"), then it means that on day 0 you will have to fallback for all files (penalizing everyone) for a long period of time, until ES6 modules become a majority, which could be 10 years from now :(

rektide commented 8 years ago

@caridy that certainly feels to me like a "good luck kid" response and not in good spirits. you haven't done anything to further anyone's understanding beyond the useful posts i've already countered. it's unclear to me why what you're saying- that there are two parsers- flies in the face of my suggestion to try one parser, then try the other parser. and i believe i've addressed performance concerns sufficiently- people who have real load time concerns can in 99% of cases ahead of time transpile.

caridy commented 8 years ago

@rektide don't get me wrong, this not a new subject. I have spent countless hours explaining "the why" to many folks, and I don't regret it. It took us years at TC39 to arrive to the conclusion that 2 parsers are the way to go for JS, and modules just happen to be the best chance we've ever had, and we took it. To back pedal for that, there have to be a very strong reason, and a very heated discussion that I don't want to be part of.

Again, we are trying to solve this particular problem for node with what we have at hand, which are the two parsers, then your alternative solution about parsing as module, if it fails, parse as script, just don't work! 1) node will not take that bullet because of perf and ambiguity 2) if you inverse that equation, we will have even more ambiguity, that will make this to break very often. Think of a file like this:

foo = 1;
doSomething();

what's the expectation for the author? is it a module? is it a script? should it throw?