isaacs / tshy

Other
890 stars 18 forks source link

package #imports #2

Closed isaacs closed 1 year ago

isaacs commented 1 year ago

The readme says this:


Package #imports

Using the imports field in package.json is not currently supported, because this looks at the nearest package.json to get local imports, and the package.json files placed in dist/{commonjs,esm} can't have local imports outside of their folders.

There's a way it could theoretically be done, but it's a bit complicated. A future version may support this.


The challenge is that #imports are only resolved from the closest package.json file. But tshy puts a package.json file in src when doing a build, to tell TSC what the dialect should be, and another one in dist/{commonjs,esm} to tell node what dialect the folder should be at run-time. But the imports are relative to the project root, and defined there.

This would be trivial to work around if #imports could be a relative path that walks up from the package.json location. However, for obvious security reasons, that's not allowed. (Eg, imagine "imports": {"#pwn": "../../../../../etc/passwd"} or something.) But, if that was allowed, then tshy could write its dialect package.json files with an imports where every path is relativized to the project root (ie, .. when in src, and ../.. in the build target).

Another approach (since that obviously won't work) is to symlink the imports targets into paths relative to the dialect target. This would work for ./src just fine, because node and tsc both resolve symlinks to their realpath when loading modules.

However, for the dist targets, we can't just symlink all the import targets to ../../${target}, because symlinks are never included in npm packages. The only solution I can think of right now would be to use a preinstall script or something to create the symlinks. Tshy could place a script at dist/.tshy-imports.js or something, which creates all the required symlinks.

Another alternative that doesn't rely on symlinks would be module shims. However, this might not work, because you can have the same target referenced as both import and require (since you can import commonjs, even though it's weird), so we can't know whether we're supposed to write it as export * from ${target} or module.exports = require(${target}).

I don't love either of these solutions, but the "symlinks plus hydration" feels less brittle, even though it's a bit more brittle and convoluted than I'd really like.

isaacs commented 1 year ago

Challenge 1: *

This gets tricky when the import has a *. In that case, it's not immediately obvious what files would need to be symlinked.

The logic exists in resolve-import's resolveAllLocalImports method. However, that method is hard-coded to resolve against node and default. It wouldn't be too hard to have some kind of getAllConditions, which walks the object creating a set of all the conditions that it's aware of. Then that list of conditions could be used to then resolveAllPackageImports for each condition, to build up the set of potential targets that we need to symlink into place.

Challenge 2: dep-name

A local #import can also be a package name. That is actually pretty simple, though, we have to just not create a symlink at all, since any node_modules resolution that works from ./ will work equally well from ./src or ./dist/{dialect}.

Challenge 3: this-package/export

If the import starts with the name of the root package, then we create a symlink at ${target}/node_modules/${pkgname} pointing to the root of the project. This only has to be done once for any imports that resolve in this way.


I think with these three things handled, the "copy imports, symlink targets" might actually work.

isaacs commented 1 year ago

Ok, I was wrong about node being hard-coded in the list of conditions used by requireImport. And, version 1.4 of require-import should have all the parts needed to make this work.

So, the approach is:

  1. copy the imports object from the root package.json into the dialect package.json files. If not present, nothing to do.
  2. fast path - get the set of unresolved local imports by calling getAllConditionalValues(imports). If none them contain *, then that's the set of things that need to be symlinked into place.
  3. slower path - If any of them do contain *, then get the set of unique condition sets by calling getUniqueConditionSets(imports), and then for each condition set, call resolveAllLocalImports(pkg, { conditions }), and get the set that way.
  4. Once we have the set of things, omit any that do not start with ./, since those are either invalid, null, or a dependency.
  5. For all that remain, add ../ (if in src) or ../.. (if in dist) to the path, and symlink to that target.
  6. When setting the dialect back to undefined (ie, deleting the generated package.json), also unlink all the generated symlinks. At this point, we have everything we need for setting up TSC for a successful build.
  7. For run-time, it's a bit different, because TSC won't generate these for us, and sync-contents wouldn't copy them anyway. So in that case:
    1. after calling syncContentsSync, link all the imports in each dist/{dialect} folder
    2. write out a ./dist/.tshy-link-imports.mjs program that will create those same symlinks, and then delete itself
    3. set the pkg.scripts['postinstall'] script to node ./dist/.tshy-link-imports.mjs

If any of the imports point to something in ./src, throw an error, because that is definitely going to cause problems. (Tshy owns ./src, it's the only way for all of this to work.)

I think that might work, actually.

isaacs commented 1 year ago

Yeah, this works great. Writing tests for it now. This will be in the next release.

andrewbranch commented 1 year ago

This would be trivial to work around if #imports could be a relative path that walks up from the package.json location. However, for obvious security reasons, that's not allowed.

I feel like in 99% of cases, the dialect package.json "imports" wouldn’t need to have relative paths reaching outside the dialect directory. Consider a package.json like

{
  "imports": {
    "#internal/*": {
      "import": "./dist/esm/internal/*.js",
      "require": "./dist/cjs/internal/*.js"
    }
  }
}

That could be tshyfied into

{
  "tshy": {
    "imports": {
      "#internal/*": "./src/internal/*.ts"
    }
  }
}

Which would become in each dialect package.json:

{
  "imports": {
    "#internal/*": "./internal/*.js"
  }
}

With the caveat that the CJS output will not be able to reach the ESM output via await import("#internal/foo") and the ESM output will not be able to reach the CJS output via createRequire(import.meta.url)("#internal/foo"), but who wants that anyway?

Curious if you considered this.

isaacs commented 1 year ago

@andrewbranch That would work for internal deps where users just want to avoid all the .. in the path, yes. But, a use case I've seen in the wild (for example, in chalk) is vendoring deps that aren't built as part of the project, either to hybridize them or to claim "zero deps" lol, or to have them as an internal dep built in a different way (especially for prebuilt binary deps).

In that sort of case, you do need to get out of the dialect folder and back up to the "real" project.

isaacs commented 1 year ago

I was thinking about this a bit more, there might be an opportunity to make this a little more efficient/simple in the cases where you have a #import that is in the dist folder, though. Like, it could detect if the target is in the dialect folder, and then instead of making a symlink, just adjust the path.

We'd still need symlinks in the "hybridized vendored dep" case, of course, and if you have something like this:

{
  "imports": {
    "#foo": {
      "import": "./dist/esm/foo.js",
      "require": "./dist/commonjs/foo.js"
    }
  }
}

then we still need a link to the commonjs version from the esm version and vice versa, because you might do await import('#foo') in commonjs and module.createRequire(import.meta.url)('#foo') in esm.

So it doesn't make the problem fully go away, and adding that optimization would mean that the esm and commonjs package.json dialect indicators would differ in more than just the type field, which would might make things a little more confusing.

andrewbranch commented 1 year ago

because you might do await import('#foo') in commonjs and module.createRequire(import.meta.url)('#foo') in esm.

You know, the great thing about tshy’s approach is that if you don’t do the symlink, and a user tries to do this, they’ll get a TS resolution error. Which makes me think, maybe you could implement resolveModuleNameLierals in the compiler host to detect that’s happening and make the symlink on the fly if needed. Though I maintain that literally nobody wants to dual emit an input file and then import the opposite format of it.

I was thinking you could let tshy.imports be declared and point to input source files. Then, rather than becoming conditional exports in the root package.json, it just gets duplicated into the dialect package.json. (I recently recommended tshy to someone working on transitioning Azure SDKs to ESM, and they were confused/disappointed that imports didn’t work like this.)

isaacs commented 1 year ago

Though I maintain that literally nobody wants to dual emit an input file and then import the opposite format of it.

Hah. I have two cases where I need to do exactly that, because the ESM and CommonJS variants need to access the same module-local variable. The way I've approached it is to have the <thing>-cjs.cts actually create the shared object, and then <thing>.ts is a one-liner import thing from '../commonjs/thing.cjs' that's //@ts-ignore'ed. I could have stashed it on the global somewhere, but then you have to worry about multiple copies of the package being in the tree, and even though that's unlikely in this case, it's a rule that I always end up regretting when I break.

I was thinking you could let tshy.imports be declared and point to input source files. Then, rather than becoming conditional exports in the root package.json, it just gets duplicated into the dialect package.json.

That is a really good idea and would be super easy to do.

isaacs commented 1 year ago

Done on 1.5

andrewbranch commented 1 year ago

Why are compilerOptions.paths needed? TS supports imports, and paths don’t generally work the same way. For example, there’s no paths equivalent to "imports": { "#foo": "foo" } since paths always resolves like a relative import, and therefore won’t take exports in node_modules/foo/package.json into account.

mpodwysocki commented 1 year ago

@isaacs Would the imports used here be used conditionally? For example, if we were to target the browser, Deno, etc, could we do that with the following so that at build time, we can switch our implementations. We already have issues with this within the Azure SDK where we swap out for Browser Implementations: https://github.com/Azure/azure-sdk-for-js/tree/main/sdk/core/core-util/src. We would like to expand to Deno, etc as well.

"tshy": {
  "imports": {
    "#nativeFunction": {
      "browser": "./src/utils/nativeFunction.browser.ts",
      "deno": "./src/utils/nativeFunction.deno.ts",
      "node": "./src/utils/nativeFunction.node.ts",
      "default": "./src/utils/nativeFunction.ts" 
    }
  }
}
isaacs commented 1 year ago

Why are compilerOptions.paths needed? TS supports imports, and paths don’t generally work the same way. For example, there’s no paths equivalent to "imports": { "#foo": "foo" } since paths always resolves like a relative import, and therefore won’t take exports in node_modules/foo/package.json into account.

TS supports imports, but doesn't support tshy.imports. And tshy.imports must be a built file found in ./src, and cannot collide with top-level #imports, so (a) there's no way to have TS's support for imports give you the right answer, and (b) it's limited to cases where tsconfig's paths are supported in the same way (albeit, wrapped in [] to make it an array).

Would the imports used here be used conditionally? For example, if we were to target the browser, Deno, etc, could we do that with the following so that at build time, we can switch our implementations.

There's no need for it to be conditional in tshy.imports, and it's not supported. The imports in tshy.imports are each built within their specific dist/{dialect} folders, so using tshy, you can do it this way:

package.json

{
  "tshy": {
    "imports": {
      "#nativeFunction": "./src/utils/nativeFunction.ts"
    },
    "esmDialects": ["deno", "browser"]
  }
}

files (all of these are optional, but assuming they all have to be different)

./src/utils/nativeFunction.ts          <-- default ESM/CommonJS version
./src/utils/nativeFunction-browser.mts <-- browser version override
./src/utils/nativeFunction-deno.mts    <-- deno version override
./src/utils/nativeFunction-cjs.cts     <-- CommonJS version override

This will build:

dist/esm/utils/nativeFunction.js
dist/commonjs/utils/nativeFunction.js <-- moved into place from nativeFunction-cjs.cjs
dist/deno/utils/nativeFunction.js     <-- moved into place from nativeFunction-deno.mjs
dist/browser/utils/nativeFunction.js  <-- moved into place from nativeFunction-browser.mjs

And in all 4 folders, you'll have dist/dialect/package.json containing "imports": { "#nativeFunction": "./utils/nativeFunction.js" } so they'll get the appropriate file built for that dialect.

Of course, that means you can also just have import { nativeFunction } from './utils/nativeFunction.js' in all 4 cases, but you might still want a local import to avoid stuff like import { nativeFunction } from '../../../some/nested/thing/nativeFunction.js

andrewbranch commented 1 year ago

TS supports imports, but doesn't support tshy.imports. And tshy.imports must be a built file found in ./src, and cannot collide with top-level #imports

I think if you were to have users put everything in tshy.imports, whether the targets are in ./src or not, you could have

It feels like that would make imports and exports handling feel more consistent with each other, and wouldn’t require tsconfig paths.

isaacs commented 1 year ago

It feels like that would make imports and exports handling feel more consistent with each other

Hm, that's true. I'll play with it, might be a way to loosen the constraint and still work with anyone who already has it set up today.

and wouldn’t require tsconfig paths.

No, I think it still would, because you can't build the thing in dist if it depends on the thing in dist being there so that tsc can find it. Or are you saying the generated top level imports in the top package.json would point to the ts files in src?

andrewbranch commented 1 year ago

I am saying the top-level imports could point to ts files in src, because in fact, if they point to JS/.d.ts files in compilerOptions.outDir, TypeScript will transparently translate that to the TS file in compilerOptions.rootDir. So depending on what the user has set as outDir/rootDir, pointing the top-level imports to either the outputs or inputs would work the same for TS resolution.

isaacs commented 1 year ago

Oh, interesting. And yeah, if you run the .ts files with a transpiler, it's just going to resolve to the local ts file, and pull in the right thing.

If I add that feature, it does mean something that's been touched by tshy 1.6 won't be able to be built again with tshy 1.5, but that's probably fine.

The only weird moment would be before the first build. Eg, you have this:

{
  "tshy": {
    "imports": { "#foo": "./src/foo.ts" }
  }
}

And then you write import '#foo' in ./src/bar.ts, then tsserver will whine about it. But as long as tshy updates package.json before running tsc, that's fine, and really no different from self-dep in exports.

isaacs commented 1 year ago

Going to make a new issue for this, we've gone well into "different thing" territory 😅