Open jonaskello opened 3 years ago
I’m guessing the
--experimental-specifier-resolution=node
flag somehow augments the internal loader to check for.js
andindex.js
(I have not looked deep into the source code though). The point being I think it would be nice if a loader could be allowed to somehow do the same augmentation. Specifically allowing imports without extension to be resolved to.ts
files without having to re-implement the full resolution algorithm. Or is there some other way to achieve this?
So to paraphrase, basically what you’re asking for is a hook within Node’s resolution algorithm, to inject custom logic to add .ts
to the list of inferred suffixes, which under --experimental-specifier-resolution=node
is .js
, .json
, .node
, /index.js
, /index.json
, /index.node
. (In the explicit/ESM-default version of the resolution algorithm, this inference doesn’t happen.) Is this more or less correct?
I understand why you’d want this, because the full resolution algorithm is daunting to reimplement. But I think that’s exactly what popular tools like ts-node
are already doing, often depending on a library resolve
that provides exactly that.
In the near term, I think that’s the path you’ll need to take; we have enough on our plate at the moment trying to complete the loaders roadmap, and providing utility functions for things that can be (and already are) implemented in the ecosystem should naturally be the lowest priority. That’s not to say that such utilities aren’t worth providing in Node core; we already provide require.resolve
, and an ESM equivalent (or multiple equivalents, to perhaps break up the parts of the resolution algorithm) might be a request you can get the core collaborators to agree to support. But it wouldn’t happen soon, unless you want to take the initiative and make the PRs yourself. If that’s something that interests you, we can use this repo to discuss what the various functions should be and their APIs, so that you have an approved design before you start coding anything.
Here are some references to catch you up on related discussions that have happened so far (deep linking to my own comments to spare you from reading the entirety of these long threads, and because I think I summarized some consensuses in those comments):
Also, have you tried simply checking for existence of `${specifier}.ts`
? What edge cases would be missed by a loader like the following?
// This code block is untested
import { access, constants } from 'node:fs/promises';
export async function resolve(specifier, context, defaultResolve) {
// For specifier './file', see if './file.ts' exists
try {
const inferredSpecifier = new URL(`${specifier}.ts`, context.parentURL);
await access(inferredSpecifier.pathname, constants.R_OK);
return { url: inferredSpecifier.href };
} catch {
return defaultResolve(specifier, context, defaultResolve);
}
}
But I think that’s exactly what popular tools like ts-node are already doing, often depending on a library resolve that provides exactly that.
I want to clarify for anyone else reading this: ts-node is not technically reimplementing node's resolver.
We ship a copy of node's source code, with a minimal patch applied to essentially expose the API being described here. Ideally node would provide this API so we don't need to ship a copy of logic that is already within node's binary.
This is not to say I disagree about priorities or time constraints, but there's also the risk that this use-case becomes harder than it needs to be because it's not considered in earlier loader design work.
What edge cases would be missed by a loader like the following?
I think it'll miss stuff like when package.json "main" points to ./index
which should get ./index.ts
Ideally node would provide this API so we don't need to ship a copy of logic that is already within node's binary.
To support this with another example, Yarn PnP as well doesn't want to replace the full resolver. We only want to inject into a very limited part of the resolution (package name to directory), and it's the way Node currently works that requires us to copy a (much) larger part of the Node logic.
This isn't helped with Node's internals sometimes relying on hidden bindings that not even loaders have officially access to (cf https://github.com/nodejs/node/pull/39513).
@GeoffreyBooth Thanks for the reply! :-). Yes you are correct that is what I'm looking for. I currently can see two main ways it could be implemented. Either as options to the default loader or as a separate API. Your example would work for relative specifiers but not for bare specifiers? For bare specifiers currently you need to re-implement the full ESM algorithm?
To give an example of what I mean as options to default loader it could be something like this (which probably is not a good idea):
export async function resolve(specifier, context, createDefaultResolve) {
const defaultResolve = createDefaultResolve({ extensions: [".ts", ".js"]});
return defaultResolve(specifier, context, createDefaultResolve);
}
EDIT: To clarify, the above is intended as a programmatic and more flexible version of --experimental-specifier-resolution
.
However, I think an API would be more useful so you can compose your own resolve from that without having to fully re-implement all of it. I did not fully follow if the discussions you linked are about such an API or are they about something else? I guess I need to read the full threads after all :-).
providing utility functions for things that can be (and already are) implemented in the ecosystem should naturally be the lowest priority.
I totally understand about the priorities. It is sure possible to re-implement the algorithm so perhaps I will do that but just wanted to check if there was an easier way. I'm not aware of the ESM algorithm being implemented in the ecosystem yet? There is an issue on the resolve repo but that seems to not have gotten far yet:
https://github.com/browserify/resolve/issues/222
As mentioned above ts-node is maintaining a full copy of the whole ESM resolver from node's code with some tweaks (to solve what I mentioned above).
For what it's worth, the relevant files are in raw
and dist-raw
. The former are copy-pasted from node, kept there merely for convenience when using the diff
command. The latter are patched and distributed.
https://github.com/TypeStrong/ts-node/tree/main/raw https://github.com/TypeStrong/ts-node/tree/main/dist-raw
For example, you can diff ./raw/node-esm-resolve-implementation-v15.3.0.js ./dist-raw/node-esm-resolve-implementation.js
to see what we changed.
At some point recently, we did discuss exposing defaultGetFormat
and defaultGetSource
in the load
hook's arguments (instead of defaultLoad
).
Or do you mean you'd want it exposed from some builtin package, like Node's module builtin?
My first instinct is to create publicly accessible versions of many of the functions in https://github.com/nodejs/node/blob/master/lib/internal/modules/esm/resolve.js, which I guess is more or less what https://github.com/TypeStrong/ts-node/tree/main/raw?rgh-link-date=2021-09-12T22%3A14%3A08Z is doing. This would correspond with the steps in the algorithm, listed at https://nodejs.org/api/esm.html#esm_resolution_algorithm. So then within your resolve
hook, you could call the same functions that Node’s internal defaultResolve
calls, but choose to replace one of them (or inject extra logic between some of them).
Yes this is exactly what I meant. Having a fine grained API with functions that does a little part each so you can compose your own loader from that and interject whatever logic you want in between, or skip/replace steps, without having to re-implement the full algorithm.
Yes this is exactly what I meant. Having a fine grained API with functions that does a little part each so you can compose your own loader from that and interject whatever logic you want in between, or skip/replace steps, without having to re-implement the full algorithm.
Okay, perhaps we should rename the issue (and reword the initial post)? And this can be a feature request. Once you can compose your own resolve
and override only the pieces you want, there won’t be a need for --experimental-specifier-resolution
anymore since it would be trivial to make a custom loader that does what --experimental-specifier-resolution
does. (Or does what it does, plus an extra inferred extension, in your case.)
I think there might be 2 separate issues here, the second being:
import { foo } from "./foo.ts"
And then just implement the loader hook as the resolve hook will not throw on the unknown .ts extension. However most typescript projects are setup to import without extensions, like this:
import { foo } from "./foo"
You'll need a custom resolve hook to return a url
because Node's defaultResolve
will throw. But that's trivial to write (there's an example for CoffeeScript in the landed and soon-to-be-released update to the ESM docs).
(I realise, Andrew, you are quite familiar with TypeScript; including extra detail for others)
In both examples you will suffer the same problem forced by the TypeScript team. The former is the correct usage; however, the latter was the lesser-of-two-evils (as well as a popular corner-cut) at the time as a workaround to avoid the TS team's decision to disallow TypeScript file extensions in import specifiers.
For the first example, that will work just fine because it's factually accurate (the file being read literally does contain the extension .ts
in its name); however, if that itself appears in a typescript file, you'd need to replace the .ts
file extension after the file/source is read but before it is passed to tsc
to be transformed (otherwise, tsc
will self-destruct). This is likely the easiest avenue because it leverages the most built-in behaviour of the various tools involved. I think this might be enough to get both sides to do what they're supposed to do.
I think relying on --experimental-specifier-resolution
is not the way to go (for a number of reasons, most obvious that it is "experimental").
My first instinct is to create publicly accessible versions of many of the [resolve utility] functions
I think this is a promising avenue. We've talked about doing that for at least one of them already; @cspotcode @jonaskello do you need all of them or just a subset?
Another couple cases that I need to support:
foo.ts
when the only file on disk is foo.ts
foo.ts
when both foo.js
and foo.ts
exist(In this example, the user has followed my recommendation and their code is import { foo } from './foo.js';
, and our loader has resolved it to /foo.ts
)
When we need to getFormat
any .ts
file, we can replace the extension with .js
and ask defaultGetFormat
how it will classify a .js file in the same location. Effectively, .ts follows the same rules as .js. In the example above, we call defaultGetFormat('/foo.js')
without risking a filesystem error or needlessly loading foo.js
from disk.
Ideally, I imagine we could do something like this pseudocode:
import {createGetFormat} from 'node:esm-resolver-helpers';
const defaultGetFormat = createGetFormat({extensionsWhichObeyPackageJsonType: ['.js', '.ts', '.tsx']});
defaultGetFormat('/foo.ts'); // reads /package.json and classifies based on "type"
We've talked about doing that for at least one of them already; @cspotcode @jonaskello do you need all of them or just a subset?
I will have to think about this.
We need defaultGetFormat
, either with pluggable extensions as described above, or we can get away with using today's extension-replacement hack.
We also need resolve
, but it will not be useful to us if we cannot customize its behavior. We can make a diff of our changes to highlight which behaviors need to be customized. One change: we added .js
-> .ts
resolution, which is a bit different than --experimental-specifier-resolution
.
We need
defaultGetFormat
, either with pluggable extensions as described above, or we can get away with using today’s extension-replacement hack.
See the updated CoffeeScript loader example: https://github.com/nodejs/node/blob/master/doc/api/esm.md#transpiler-loader. We added a new helper function that’s probably a better solution in general than the extension-replacement hack. I think there’s a good argument for this helper being part of core, since if it’s in core it could reference the cached package.json
files rather than needing to crawl the filesystem for files that the Node process has already loaded. (This is also much more generic and flexible than something particular to file extensions.)
Currently our duplicated resolve
loader also needs to read package.json
since it needs access to other fields like main
and exports
. It might be worthwhile exposing the package.json
cache for more than just getting package types.
I forget, will an http loader composed with a TS loader be able to respect package.json
type if that package.json
lives on the http server? Since node could theoretically load package.json
files via loaders as 'json' modules?
Exposing that cache would be incredibly useful in general; it'd be great to get something like that shipped and backported to node 12 if possible!
Currently our duplicated
resolve
loader also needs to readpackage.json
since it needs access to other fields likemain
andexports
. It might be worthwhile exposing thepackage.json
cache for more than just getting package types.
That’s what I was getting at with the links above in my first comment, which discuss various needs for this data and ideas for APIs for exposing it. Providing easy access to the metadata around a file or a package is useful in general, for a variety of use cases. It’s a little tricky in practice, as there are potentially multiple package.json
files you might want—think of a package that has a “root” package.json
, and then say a dist/package.json
with "type": "commonjs"
. For the “what’s the controlling type
of this file” use case, you want the latter package.json
; but for cases like “what’s the browser export of this package” you want the former. So the function or set of functions will need to be flexible. But yes, in general, I think we should provide a straightforward, non-hacky way to retrieve this data that spares user code from needing to hit the filesystem.
I forget, will an http loader composed with a TS loader be able to respect
package.json
type if thatpackage.json
lives on the http server? Since node could theoretically loadpackage.json
files via loaders as ‘json’ modules?
This is all hypothetical, since Node itself doesn’t load any modules over HTTP; that would be up to however you design your HTTP loader. The example one in the docs just assumes all JavaScript via HTTPS is ESM. I would think that if you want to get the “type” of JavaScript loaded over HTTP, the standard way to do so (along the lines of how browsers do things) is to look at the MIME type via Content-Type
header. ESM would be text/javascript
per spec and CommonJS would be application/node
. But it’s a custom loader so you could do whatever you want: if you want to make successive network calls looking for package.json
“files” at ever-shorter URL path segments, you could do so.
@cspotcode @jonaskello @arcanis please see (and reply to) https://github.com/nodejs/loaders/issues/27#issuecomment-918619779, I’m trying to reschedule the meeting where we might discuss this.
@GeoffreyBooth seems like an API that took a path, and returned the "closest" package.json (perhaps, with an options bag where you could indicate you want a "package" or not), would be super useful?
I noticed PR 44501 in the typescript repo where @weswigham is working on node12 and nodenext module resolution for the typescript compiler. I have not looked deep into that code but I would suspect it is re-implementing a large part of the the resolve algorithm. Perhaps parts of the API discussed here could be helpful for future versions of the typescript compiler too?
This is all hypothetical, since Node itself doesn’t load any modules over HTTP; that would be up to however you design your HTTP loader.
Specifically I'm thinking about two loaders playing together: how might an http loader integrate with a TS loader. Supposing the HTTP loader is interested in allowing package.json to reside on a remote server, and the TS loader is interested in respecting fields from within the package.json to affect module resolution and classification. The TS loader wants a generic way to get package.json but doesn't care where it comes from. The HTTP loader wants to provide package.json from http sources but doesn't care how it's used.
Using the TS API as an example, it has reusable resolver implementations that can do tasks involving filesystem traversal, but the underlying filesystem is pluggable: you provide a Host
implementation which it uses to read directories and files. Should defaultGetFormat
, or whatever utilities are exposed by node, accept a DI-able filesystem implementation?
I think this'll come up when designing loader composition but maybe it's good to think about now, too.
I noticed PR 44501 in the typescript repo where @weswigham is working on node12 and nodenext module resolution for the typescript compiler. I have not looked deep into that code but I would suspect it is re-implementing a large part of the the resolve algorithm. Perhaps parts of the API discussed here could be helpful for future versions of the typescript compiler too?
TS needs to run in non-node environments, so they unfortunately cannot rely on any node APIs. They have an implementation of ts.sys
that delegates to node's fs and other low-level APIs, but something like ascending a filesystem hierarchy has to all be implemented in the compiler so it can run in a browser and other environments.
Yeah, @weswigham has clued us in about the TypeScript filesystem resolution algorithm in the past. Feel free to read up more on that at https://github.com/nodejs/help/issues/2642#issuecomment-616819273. We may need to implement pieces of that algorithm in core as well since we probably still have unintelligible error messages for failed resolutions on our supported host OSes. I do like the idea about being able to provide Host
— @cspotcode, would you mind linking the source to that?
Are you referring to one of the *Host
interface declarations as an example? Or an implementation of TypeScript's System
, which declares every bit of functionality that the compiler attempts to use from the runtime / OS, such as filesystem access, timers, native implementations of hashing algorithms, etc? I mentioned it as an example of dependency injection, though I imagine node's reusable resolver utilities would accept implementations of their own Host
interfaces, requiring only the subset of filesystem functionality they require.
@guybedford, if I can loop you in, what do you think of https://github.com/nodejs/loaders/issues/26#issuecomment-918413522? Since you wrote many of those functions and designed the overall ESM resolution algorithm.
Regarding which functions I would need in the API.
My goal is to be able to resolve typescript files, not just in simple cases for absolute/relative files, but in more complicated cases where you have tsconfig.references
(project references) in a yarn workspace and also tsconfig.paths
specified. The way typescript works it requires you to import from the output that it will create. This means that you can for example do this:
import { foo } from "@myapp/pkg-a/lib/foo.js";
Now the source for the file being imported above lives in packages/pkg-a/src/foo.ts
and packages/pkg-a/lib/foo.js
does not have to exist on disk at the time we want to resolve becuase we have not run the compiler yet. However we still need to be able to resolve it and load the corresponding source file. In the above example we have a sub-path for the package specifier but the same is true if there was no sub-path. For example you can have this:
import { foo } from "@myapp/pkg-a";
And in @myapp/pkg-a/package.json
the main
field is set to lib/index.js
but the source for that lives in src/index.ts
and the file lib/index.js
may not exist if the compiler has not run. However we need to be able to resolve it back to src/index.ts
and load that instead.
I've done some initial experiments with the code in resolve.js
. I found two main places where it hits the filesystem and both of them are in the packageResolve
function. One is to ascend the filesystem looking for package.json
and the other is when legacyMainResolve
is called becuase that function will probe for files in the file system according to the legacy rules.
If there was an API of helper functions for a loader, I think ideally they should be able to be called without hitting the filesystem. One reason is for typescript as I stated above about the files not being compiled yet so if the resolution tries to probe for them there will be an exception. Another reason may be if you don't want to lookup package.json
in the filesystem but get it from somewhere else (I think this is what @arcanis wanted above but I might have misunderstood).
Currently I'm experimenting with some refactorings to the resolve.js
file. In packageExportsResolve
and packageImportsResolve
, I've added a parameter for packageResolve
so you can inject your own impl of that. This means you can avoid filesystem access in packageExportsResolve
and packageImportsResolve
. I've also exported these functions that I use in my loader:
emitLegacyIndexDeprecation,
getPackageConfig,
getPackageScopeConfig,
shouldBeTreatedAsRelativeOrAbsolutePath,
packageImportsResolve,
packageExportsResolve,
parsePackageName,
getConditionsSet,
finalizeResolution,
ERR_MODULE_NOT_FOUND
Probably I don't need all of them in the end. The problem I encountered now is that legacyMainResolve
hits the filesystem. My current thinking is that instead of probing for files, it could return an array of possible files. This means that packageResolve
would return an array of possible files. This would leave the filesystem probing to the loader so it can decide which file is the actual file to use. For example if legacyMainResolve
comes up with lib/index.js
as an alternative and that does not exist, I can still translate it to src/foo.ts
without hitting an exception in between.
I'll continue to experiment a bit and see where it takes me.
I'm wondering if tsconfig.paths
and tsconfig.references
need support as it seems they'll be obsolete with typescript supporting workspaces and subpaths / module resolution (currently in beta).
Yes, I also think subpath imports could replace tsconfig.paths
. I'm kind of hoping the typescript team makes tsconfig.paths
incompatible with the node12 modules setting so we don't have to support tsconfig.paths
in loader hooks.
For tsconfig.references
I'm not sure there is an overlap with subpath imports. Could you elaborate about that?
Ah, sorry, I meant
tsconfig.paths
↔ packageJson.imports
(subpath)tsconfig.references
↔ packageJson.workspaces
I have not used tsconfig.references
, but they look exactly like packageJson.workspaces
and a quick read of the TS project references doc sounds like workspaces (npm and yarn).
tsconfig.references
tells the TS compiler to do outDir
-> rootDir
mappings for additional tsconfigs.
I think, but I'm not sure, that outDir
-> rootDir
resolution is unnecessary within a single-tsconfig project, because all relative imports happen within the same rootDir
. Once you get into composite projects, then code in one of the projects is attempting to import from the outDir
of another. For the language service, and for our purposes, this means resolving from non-existent outDir
s to their corresponding rootDir
s.
I don't think workspaces describe that outDir
->rootDir
mapping.
I didn't mention rootDirs
above, but that would also need to be considered.
Caveat that I haven't read the entire thread above, so maybe I'm retreading previous conversation.
Yes, packageJson.workspaces
and tsconfig.references
work in tandem. Using workspaces you get symlinks for your packages in node_modules
. Using tsconfig.references
each package can have it's own tsconfig.json
for separate root/outDir and other compiler options.
Additionally tsconfig.references
is also used to describe the dependencies between the packages. Eg. packageA
can have a tsconfig.references
to packageB
and then tsc
knows it has to compile PackageB
before PackageA
. Granted tsc
could infer this from package.json.dependencies
but this is not how the typescript team choose to do it.
Jumping in to this conversation pretty late, but I did implement a TS Loader without the need for implementing module resolution. You can see the specific lines that did it here: https://github.com/giltayar/babel-register-esm/blob/eeb9a79c7646c7ee38e0cbb64de85b0548057076/src/babel-register-esm.js#L27.
(It's been working pretty well at our company for the past half year or so. No gotchas found.)
The idea is that if the .js
file does not exist (because transpilation didn't occur), it will use the defaultResolve
to find the equivalent .ts
file, and if it exists, use that.
In essence, defaultResolve
is the "utility" I use to implement module resolution.
Am I missing something? Is this a bad idea?
@giltayar Yes that works for simple cases. For example esbuild-node-loader also takes a simple approach. I don't think there is something bad about this if that is all you need.
The problem is that it does not work in more advanced cases. The two specific cases that I know of are:
src
and lib
). I describe this problem in more detail here. Also you will probably get worse performance if you first scan for js
files using the default resolve, and then a second time for ts
files with your own logic, but this is probably not a big issue if you use the loader only in development.
@giltayar Yes that works for simple cases. For example esbuild-node-loader also takes a simple approach. I don't think there is something bad about this if that is all you need.
The problem is that it does not work in more advanced cases. The two specific cases that I know of are:
- I have a monorepo with multiple typescript packages. When you traverse a package border, just changing the file extension does not work anymore if you keep source and compiled code in different folders (usually
src
andlib
). I describe this problem in more detail here.- You have configured paths in tsconfig.
Also you will probably get worse performance if you first scan for
js
files using the default resolve, and then a second time forts
files with your own logic, but this is probably not a big issue if you use the loader only in development.
I had a similar problem, I needed to point directly to the source code instead of the bundle dist when using a local bundle in a monorepo to avoid a lot of initial builds at the beginning, and also to debug the code
Update, I solved my problem with createRequire()
which allows me to import another file from a specified file, e.g. when importing the a
module, it is automatically redirected to a/src/
by the plugin
Today it is possible to alter the resolve algorithm to some extent using
--experimental-specifier-resolution
. However that switch only supports a couple of fixed use-cases and is not available programmatically within a loader implementation. When implementing a custom loader it would be nice to have access to a fine grained API so we can re-use parts of the default resolve algorithm but still be able to interject whatever logic you want in between steps, or skip/replace steps, without having to re-implement the full algorithm. That way a custom loader can be composed from the API rather than have to re-implement the whole resolve algorithm.The old node resolve algorithm was quite easy to re-implement so for that we did not really need a fine grained API. The new algorithm supports more features and is therefore more work to re-implement. So IMHO it would make sense to expose an API for the steps in this more complex algorithm rather than having custom loaders re-implement or copy it in full.
The use-cases for this would be for example implementing a loader for
.ts
files in ts-node where the algorithm is the same but only differ in file extension. Also, Yarn PnP could use this for its resolve. See below for those requests.ORIGINAL POST (before the issue was re-formed as a feature request):
I'm sorry if this is not the right forum to ask this, if not pls let me know where it is appropriate to ask.
I'm curious about how the loader specification and
--experimental-specifier-resolution
interact. Specifically I'm looking into loading ES modules directly from typescript.ts
files using any specifier (bare, relative). With the PR "esm: consolidate ESM Loader methods #37468" merged, this seems to have gotten easier. From what I understand it would now be possible to have something like this:And then just implement the loader hook as the resolve hook will not throw on the unknown
.ts
extension. This seems like a nice improvement as we don't have to re-implement the full resolve anymore. However most typescript projects are setup to import without extensions, like this:I believe having this resolved is related to
--experimental-specifier-resolution
but AFAIK this only support.js
. So in order to resolve the above tofoo.ts
the only way I see today is to re-implement the whole module resolution algorithm, which with the new spec is a rather big task. I think that is what ts-node is doing here, basically copying the impl from the internal nodejs loader.I'm guessing the
--experimental-specifier-resolution=node
flag somehow augments the internal loader to check for.js
andindex.js
(I have not looked deep into the source code though). The point being I think it would be nice if a loader could be allowed to somehow do the same augmentation. Specifically allowing imports without extension to be resolved to.ts
files without having to re-implement the full resolution algorithm. Or is there some other way to achieve this?