vercel / pkg

Package your Node.js project into an executable
https://npmjs.com/pkg
MIT License
24.31k stars 1.01k forks source link

ES modules not supported #1291

Closed LinusU closed 9 months ago

LinusU commented 3 years ago

I'm getting the following error as soon as the compiled app boots:

node:internal/modules/cjs/loader:930
  throw err;
  ^

Error: Cannot find module '/snapshot/dhjaks/index.js'
    at Function.Module._resolveFilename (node:internal/modules/cjs/loader:927:15)
    at Function._resolveFilename (pkg/prelude/bootstrap.js:1776:46)
    at Function.Module._load (node:internal/modules/cjs/loader:772:27)
    at Function.runMain (pkg/prelude/bootstrap.js:1804:12)
    at node:internal/main/run_main_module:17:47 {
  code: 'MODULE_NOT_FOUND',
  requireStack: []
}

Here is a minimal reproducible example:

package.json

{ "type": "module" }

index.js

import os from 'os'

console.log(os.arch())

Build command:

pkg index.js
CleyFaye commented 1 year ago

Well, there are discussions on github now.

About your issue, you can use import statements and as far as I know pretty much anything except top-level awaits as long as your bundler (I use webpack, but other works too) can produce CommonJS. If you need some extra imports that are not bundled directly, you can put them in the virtual fs using pkg config.

Anything that produce a single JS file that works as a CommonJS module will do. Unforutnately, these don't have top-level await.

Inrixia commented 1 year ago

If you wanna see a project that uses pkg with esm you can look at https://github.com/Inrixia/Floatplane-Downloader

But you cannot use top level await (or have dependencies that use it) as its not possible to transpile that functionality.

ForbiddenEra commented 1 year ago

So.. I was playing around - not with pkg but with bytecode compiling in general.. I've managed to build a framework that works with CJS, ESM without transpilation, compiling into bytecode w/out any issues.

I haven't dug into pkg enough to figure out where the hold up is, but I just want to point out it's definitely not impossible. I am using experimental module loader for my toy, though I'm not sure if that's a requirement to get it working (sorry, been over a month since I was digging into that)

If you wanna see a project that uses pkg with esm you can look at https://github.com/Inrixia/Floatplane-Downloader

Compiling software that steals from Luke's site? Interesting.. perhaps you should share since they share, but it's not my place to judge or really care besides this snarky comment ;) (especially since I decided to keep my current job instead of going to work for him which was honestly one of the toughest choices of my life)..

Inrixia commented 1 year ago

Compiling software that steals from Luke's site? Interesting..

@ForbiddenEra Just to clarify FPD requires a Floatplane account and utilizes the download functionality provided by Floatplane so no stealing here :) though what would I share??

I've worked with AJ on things surrounding it so they are well aware of it's existence too.

Anyway so as not to get too far off topic looking at what you posted about bytecode compilation that's exactly what I expect tbh. I don't see a reason it shouldn't be possible, infact I think there was a working pr submitted for pkg (or one that was wip) but it's been blocked for some time if I'm remembering correctly.

piranna commented 1 year ago

So.. I was playing around - not with pkg but with bytecode compiling in general.. I've managed to build a framework that works with CJS, ESM without transpilation, compiling into bytecode w/out any issues.

I haven't dug into pkg enough to figure out where the hold up is, but I just want to point out it's definitely not impossible. I am using experimental module loader for my toy, though I'm not sure if that's a requirement to get it working (sorry, been over a month since I was digging into that)

Can you share it? I have been working on a bytenode wrapper and it would make things easier to work directly with ESM instead of needing to do a webpack pre-step to convert code to a CommonJS bundle first.

ForbiddenEra commented 1 year ago

So.. I was playing around - not with pkg but with bytecode compiling in general.. I've managed to build a framework that works with CJS, ESM without transpilation, compiling into bytecode w/out any issues. I haven't dug into pkg enough to figure out where the hold up is, but I just want to point out it's definitely not impossible. I am using experimental module loader for my toy, though I'm not sure if that's a requirement to get it working (sorry, been over a month since I was digging into that)

Can you share it? I have been working on a bytenode wrapper and it would make things easier to work directly with ESM instead of needing to do a webpack pre-step to convert code to a CommonJS bundle first.

I'll consider it; I can't make any promises, it's not finished and it's been built onto the newest version of my web platform which has always been a commercial product, though I've been considering open sourcing it even if it's at minimum a dual-license kind of thing. And even if I don't open it up, perhaps if I can find some time, perhaps I can poach out a few gists or something on how it works - I was looking at it tonight (as I can only work on this in my spare time currently) and was trying to refresh my memory on things, looks like last time I was working on it I was splitting it up a bit, like having the compiler part into a semi-separate npm module as well as the loaders, perhaps I can even look at open sourcing just those bits once I get it sorted.

The actual compilation part works basically the same as everyone else, eg. bytenode, so I suppose the useful 'magic' is probably in the loaders. I also wasn't quite going for the same goal where it simply outputs a single executable package, though I'm sure that can be made to happen, but at least right now I can do a import testModule from 'testModule.jsc' assert { type: 'jsbin', key: "<key here>" }; where testModule.jsc is a js bytecode binary encrypted using <key here> which was what I was going for in this specific case, though I'm sure others want to just distribute a single compiled file that people can just run, likely with a pre-packaged included node like pkg here or the future SEA will do..

I don't know if it helps but I am using the experimental module loader to allow me to import compiled files anywhere or even standard ts files that get transposed on the fly (which is basically just the example in the nodejs docs for module loaders, heh) and also had to do a bit of magic in the final returned source with vm.SourceTextModule and vm.SyntheticModule and linking with them - I think those are probably the main key actually, I'm not sure using the experimental module loader stuff is needed unless you want to directly import bytecode as I am doing.

Once I've had a chance to dig back in and refresh my memory about this then I'll definitely consider sharing at least a snippet or gist here but with the time since I was working on it and with it being a bit complex and using new/experimental stuff, I don't want to just post something that's not useful or sends someone in the wrong direction or down the wrong rabbit hole! I can't make any promises though as this stuff only gets worked on in my spare time which isn't much lately - but if what I've done can help close this issue then I'll definitely try to share what I can if I find the time.

piranna commented 1 year ago

perhaps I can even look at open sourcing just those bits once I get it sorted.

I think that could be enough :-)

I suppose the useful 'magic' is probably in the loaders.

I think so.

at least right now I can do a import testModule from 'testModule.jsc' assert { type: 'jsbin', key: "<key here>" }; where testModule.jsc is a js bytecode binary encrypted using <key here> which was what I was going for in this specific case

This looks REALLY interesting, and I have been trying to get a similar functionality for Mafalda SFU. I have yet not get into the encription / signing part, in part because I was more interested on a licenses model with expiration date, both for libraries and final executables, but definitely it's something I was thinking about.

I'm sure others want to just distribute a single compiled file that people can just run, likely with a pre-packaged included node like pkg here or the future SEA will do

Yes, but also binary libraries protected with a license or a key can be useful too.

Once I've had a chance to dig back in and refresh my memory about this then I'll definitely consider sharing at least a snippet or gist here but with the time since I was working on it and with it being a bit complex and using new/experimental stuff, I don't want to just post something that's not useful or sends someone in the wrong direction or down the wrong rabbit hole! I can't make any promises though as this stuff only gets worked on in my spare time which isn't much lately - but if what I've done can help close this issue then I'll definitely try to share what I can if I find the time.

Definitely it's something I would be interested about :-) I don't have published my tool as open source too, in part to don't provide tips to somebody willing to reverse engineer my code, but have it totally isolated from my main code and would be easy to integrate something like this.

ForbiddenEra commented 1 year ago

I think that could be enough :-)

Will see what I can do.. Going on vacation here soon, if I find myself bored one night in a hotel room with my laptop maybe I'll look into it.

Admittedly though, I wonder if I'd almost prefer to actually clone the repo and dive into the problem and see about submitting a PR. I don't know if/when I could dedicate the time but if I provide the solution as a PR then I'm sure I can probably get listed as a contributor, whereas if I provide the solution in an issue then I likely wouldn't be considered a contributor. Not trying to be selfish here but it would suck to provide a solution and have someone else copy/paste it in and get the credit. Hopefully that doesn't seem unreasonable or selfish at all.

I think so.

Definitely part of it, but I was peeking after/while writing my reply, some of it is definitely also vm.*Module stuff for loading compiled modules nicely but also the whole path finding/module resolution thing is definitely the loaders part. I definitely had to have some fun and squeeze some secret sauce with the vm module stuff to be able to properly import compiled modules.

This looks REALLY interesting, and I have been trying to get a similar functionality for Mafalda SFU. I have yet not get into the encription / signing part, in part because I was more interested on a licenses model with expiration date, both for libraries and final executables, but definitely it's something I was thinking about.

Yeah, one of my reasons for implementing was the desire to be able to distribute packages that can be partly or fully bytecode as well as compressed and encrypted with various encryption methods for licensing purposes. I hadn't thought about expiration in the way of self expiring licenses at all but I was thinking about a license server kind of thing. Any type of protection I would deem realistic in the real world would be a rather difficult discussion with JS, one can reverse bytecode just like one can disassemble a typical executable ABI program and in some respects it's potentially even easier. And even without reversing, if you can run the JS, you can debug it pretty thoroughly regardless.

In a lot of cases though I assume that even some level of protection might be enough, at least if your target clientel is moreso corporate or business clients and not the general public as often that target audience won't want to risk non-compliance but could still happen if all a developer has to do is comment out a few lines of license-related code.

When it comes to the general public, again this is JS. I've put a lot of thought into this and the best solution I could come up with that would have any real level of protection would involve a license server. Without that, it can be tough. If using a license key, you have to already deal with all the traditional issues (sharing keys, whether keys expire, whether keys are tied to any system or activated in any way, etc) but also even using the experimental loader stuff requires at least one loader 'layer' to be raw JS in a way that vanilla node can run it, even if you offload the more fun stuff to a second loader that maybe is decrypted or something by the first, one way around this I can think of would be including some sort of actual standalone binary that handles part of the decryption step. At very least your first level loader needs to be executable by native node and without other code handling the bytecode part (which is what you want to use the loader for anyway) that code has to be interpreted by node, thus plaintext JS and you'd at very least be giving away how you load bytecode compiled files even if decryption is done by a binary or following loader layer. At the very least I think you'd need a small loader to load the bytecode into node appropriately of the real loader that might handle the more fun stuff like decryption, etc...but again, JS is JS, if you're serious about protection then you might also want to consider if any V8 options might modify the bytecode from 'standard' if any do to make it more difficult to reverse and looking at ways to prevent users importing the code from being able to run the debugger across it, though if you prevent debugging across the whole importing app you might get some annoyed devs. Then again, it's also JS and I don't think a lot (definitely not all!) of JS developers even know what a debugger is ;)

Some other interesting things you can do though is code signing (which could of course work in conjunction with encryption/keying) where you don't run the code if it doesn't match it's hash/checksum and/or use that hash/checksum as part of your decryption key, again you'd have to obfuscate your decryption somehow.

You could encrypt and sign with a private key and distribute a public key for use, this would prevent easy modification of the code but doesn't prevent anyone obtaining the public key from running it in general, although this could be useful as a security feature maybe? I mean, we already are using SHA hashes for JS on the browser side especially to verify code delivered by CDNs, I feel like this could be pretty easily implemented on the server/node side as well with this method, after all, it's not like we haven't seen attacks on misspelled/mistyped or abandoned npm packages in the past, though you'd probably want a better way of distributing said hash/checksum than just tossing it in your package.json if it were desired to protect against that, but definitely something you could do and I feel like the import assertions-style syntax is ripe for these types of usages, hence why I used it for providing decryption/license keys in my system.

Of course, you can also use the loaders to transpile source on the fly in a way and/or pre-compile it, again this example is in the docs for the loaders as it is but being able to use JS, TS, JSX, TSX in a project without having to think about it or ever transpile anything myself with the option of having the result compiled into bytecode immediately is nice to have and I've also used the import assertions-style syntax to assert the filetype is what's expected regardless of it's extension, though it can be detected by extension as well of course but I also feel like .jsc, '.jsbin,.tsc` etc aren't particularly standard/well-known, so why not allow whatever and use that syntax to assert/specify what is what. You could in theory even use it to specify additional/specific options for transpiling a certain typescript import.

Yes, but also binary libraries protected with a license or a key can be useful too.

Indeed; I wish there were an easier solution and again it's something I've put a bit of thought into and worked on a bit; it can be difficult to protect against things and envision the perceived attack surface when you're the one who developed the protections and know how to side-step them easily, and again the nature of JS doesn't particularly help us here but I'm open to ideas and discussion on how we can try and protect our code where needed, that's partly why as well I was trying to make it in a way where you can just have a single file or module compiled/encrypted, sometimes the whole project doesn't need it but that can still be an option as well.

Definitely it's something I would be interested about :-) I don't have published my tool as open source too, in part to don't provide tips to somebody willing to reverse engineer my code, but have it totally isolated from my main code and would be easy to integrate something like this.

At the very least, I was considering releasing it publicly for use even if I don't release the source so that others can compile their code, encrypt/license it and have a loader to use it in projects or allow other projects to use it. I'm not sure if or when that might happen and I'm not particularly comfortable mentioning it in this issue thread anyway - this of course isn't the place to promote my own work.

Aside/back to original topic in the light of trying to help here:

I'd have to review the thread again but IIRC and if I'm understanding right, the biggest issue was ESM module resolution issues, right? Using the experimental loader stuff can definitely help with that, but that's not the only roadblock I ran into as you can't load a compiled ESM module the same way you load a CJS/standard script, you have to use the vm module stuff as I mentioned above.

One can use the loaders and benefit from nodes resolution though, if that's the primary issue then I'd this guidance might push things forward - although, currently the loader stuff is experimental and I'm not sure if the project maintainers want to go there, however, I don't know if there'd be an alternative without figuring out resolution on your own and trying to ensure it's on par with node's and I'm not sure about others but myself personally would have/would be willing to accept using an experimental feature if it enabled ESM here.

Jordan-Eckowitz commented 1 year ago

I see that both Deno and Bun can create executables with ESM support. https://deno.land/manual@v1.36.0/tools/compiler https://bun.sh/docs/bundler/executables

I haven't tested this yet myself but curious if anyone else has?

stormwulfren commented 1 year ago

It should be noted that both Deno and Bun aren't doing the same thing as pkg. They're completely different runtimes and do things differently from the node runtime. It's not a simple case of "seeing what deno and bun do under the hood and lifting it". It's apples and oranges. Not saying, @Jordan-Eckowitz, that's your implication but just figured I'd say now, so others don't get the wrong idea.

I've tried both Deno and Bun for some of my use-cases. IMHO they're good for smaller projects, but if you have a larger projects with predefined outcomes with the expectation they're drop-in replacements for nodejs/typescript you're in for a bad time.

ForbiddenEra commented 1 year ago

It should be noted that both Deno and Bun aren't doing the same thing as pkg. They're completely different runtimes and do things differently from the node runtime. It's not a simple case of "seeing what deno and bun do under the hood and lifting it". It's apples and oranges. Not saying, @Jordan-Eckowitz, that's your implication but just figured I'd say now, so others don't get the wrong idea.

I've tried both Deno and Bun for some of my use-cases. IMHO they're good for smaller projects, but if you have a larger projects with predefined outcomes with the expectation they're drop-in replacements for nodejs/typescript you're in for a bad time.

Agreed; I was excited to see both when I discovered them but it was pretty quickly obvious that neither were quite ready for use in any projects that I'm involved with yet and even new ones would, as you said, likely have to be something smaller, not to knock their hard work - they should definitely continue, but the community and ecosystem need to be on board and keep up as well, it was many, many years before I was willing to use node even vs. a standard web server and CGI and not all the concerns I had about switching to node have been resolved or were even resolvable.

Although (and I'm sure it's been stated) node itself has plans for some sort of SEA-ability; whether that will be equivalent to Deno/bun's attempts in this space or competes/replaces things like PKG here I suppose is still to be seen.

As an aside, I've not heard any comments back on whether the maintainers or community would be for or against using/requiring/allowing the use of a loader (as they're still marked experimental) to accomplish the ability; if everyone's against using anything experimental for this, I can understand but then there's not much sense in sharing my solution unless/until loaders are no longer marked experimental?

stormwulfren commented 1 year ago

@ForbiddenEra I've been exploring solutions for a new greenfield project, and the main challenges I've been facing revolves around desktop deployment + licensing. In terms of deployability, for my purpose, Node/Typescript, on the face of it seems like the obvious choice for the projct because it'll run on pretty much anything under the sun with pretty decent platform parity.

One of my desires for the project was to build it ESM First, but honestly, even getting typescript to work properly in ESM mode with third party dependencies was a challenge. Especially those that have taken the route of writing their library CommonJS First, adding TS types and ESM compatibility aliases at a later date. I had serious difficulty importing AJV, for example, to the point I was considering literally rewriting the entire damn thing in typescript from scratch.

ESM loading on the whole seems to simply have too many quirks for me to even consider using it for a new project. On paper, I'm convinced it's the standard we as the community should be following when building libraries, but it just looks like adoption isn't quite there yet to build an end product as ESM. If I was developing the entire thing in-house, zero dependency style, then sure, I'd probably risk it.

In the grand scheme of things, whether a project is deployed as CommonJS or ESM is largely a technical niggle at best. It doesn't affect the broader execution of the developed software, CommonJS isn't deprecated, it's not going anywhere any time soon. It's adequate. Debate me, but I think that's where my gut feeling is for now. Happy to discuss with anyone who disagrees.

That said. Bringing this conversation back to the scope of pkg ... The main purpose of pkg (and kin) is to create a single, deployable executable. I feel that between the methods used by pkg, caxa, nexe, electron-builder this aspect is a reasonably solved issue. The bit that isn't largely solved for JS/Node is the topic of licensing, DRM, source code protection. I had a look into it for the purposes of my own project. Bytecode compilation is pretty much the best we have right now, which can be decompiled with ghidra.

The problem with using loaders for encryption in the manner @ForbiddenEra describes:

import testModule from 'testModule.jsc' assert { type: 'jsbin', key: "<key here>" }; where testModule.jsc is a js bytecode binary encrypted using <key here>

You will end up with a full decrypted copy of whatever you load in memory, which is then passed to the interpreter. This memory could then be read verbatim, saved to a file, then decompiled as usual. In fact, it might be possible (I'm not 100% sure on the logistics), but it might simply be possible to read the executing code from node's v8 code cache. Which would be available regardless of any encryption at any relevant time accessible by loaders. You might be able to mitigate that attack surface by running node in jitless mode, but I'm not sure what effect that would have on pkg. My understanding is that pkg takes a snapshot of the v8 cache and re-seeds it at runtime? So, if the node exec is running jitless, it doesn't pre-allocate the executable memory, so, possibly it'd just barf? Someone more intimate with v8 & pkg chip in if possible?

I'm not saying that custom loaders couldn't be part of the solution, in fact, I think they're the best we'd get without direct access to the AST.

In terms of my thoughts about what pkg's role in this would be, I feel that it'd be somewhat out of scope for the project. However, if pkg could support passing through the experimental loader flag, that might be in-scope.

Alternatively, my next thought would be to create a native plugin that reads encrypted snapshots, and runs them in an isolated worker thread, basically doing what I understand pkg to do. Same caveats as above, wouldn't need custom loader though. Food for thought.

Perhaps this off-topic talk re: licensing could be moved to a discussion? It's interesting, but not what this ticket's for.

ForbiddenEra commented 1 year ago

ESM loading on the whole seems to simply have too many quirks for me to even consider using it for a new project. On paper, I'm convinced it's the standard we as the community should be following when building libraries, but it just looks like adoption isn't quite there yet to build an end product as ESM. If I was developing the entire thing in-house, zero dependency style, then sure, I'd probably risk it.

I've been trying to use it as much as possible for new stuff without too many issues. I had many more issues with loading CJS stuff in an ESM project.

The problem with using loaders for encryption in the manner @ForbiddenEra describes:

import testModule from 'testModule.jsc' assert { type: 'jsbin', key: "<key here>" }; where testModule.jsc is a js bytecode binary encrypted using <key here>

You will end up with a full decrypted copy of whatever you load in memory, which is then passed to the interpreter. This memory could then be read verbatim, saved to a file, then decompiled as usual.

This is mostly true and something I've considered and thought about how one could work around it but regardless, in the end, you'll be at best feeding bytecode into node, which as you say can be decompiled without too much difficulty.

Fact is, JS is an interpreted language which only makes these things much more difficult. Even a different interpreter, say Bun as a presently-relavent example, even if some sort of protection was a core feature, it still uses JavaScriptCore just like Node uses V8. Beyond that, what do we do? Compile an AST to ASM or WASM?

Not that compiled languages can't be decompiled as well, but when running natively you can use security features of the OS and do things like self-modifying code but the software licensing security problem is far from solved in any domain, the closest I can think of is always online activation perhaps with some additional tricks to check code isn't modified and prevent packet capture/replay/simulate-style attacks like running a hacked local license server.

In terms of my thoughts about what pkg's role in this would be, I feel that it'd be somewhat out of scope for the project. However, if pkg could support passing through the experimental loader flag, that might be in-scope.

It could at least solve the ESM issue with an implementation like I've created.

Alternatively, my next thought would be to create a native plugin that reads encrypted snapshots, and runs them in an isolated worker thread, basically doing what I understand pkg to do. Same caveats as above, wouldn't need custom loader though. Food for thought.

I had a thought along those lines to try and convert/compile JS to WASM in some way. But that's a really deep rabbit hole for the time I have available.

Perhaps this off-topic talk re: licensing could be moved to a discussion? It's interesting, but not what this ticket's for.

I don't disagree; I'm just not sure where. I've tried to mostly stay on-topic while answering questions and offering a bit of extra context regarding what I've put together, though really it was mentioned because it was able to do what pkg does regarding compiling/saving/reloading bytecode on Node but with ES modules.

robbie-cahill commented 1 year ago
  • tsc index.ts without any TS configuration
  • rollup index.js --file bundle.js --format cjs to bundle everything together
  • pkg bundle.js --targets node18-win-x64,node18-linux-arm64 to create the executable?

This is working for me. The only thing I needed to add was a jq hack to temporarily remove "type":"module" from package.json so that node would not complain.

cat package.json | jq 'del(.type)' > /tmp/package.json && mv /tmp/package.json package.json # Workaround: remove "type": "module" so node does not complain about require in cjs
rollup dist/bin/tunnelmole.js --file tunnelmole-bundle.js --format cjs
git checkout package.json # Remove workaround, set package type back to module
pkg tunnelmole-bundle.js --targets node18-linux-x64 --output tmole-linux
AnzhiZhang commented 10 months ago
  • tsc index.ts without any TS configuration
  • rollup index.js --file bundle.js --format cjs to bundle everything together
  • pkg bundle.js --targets node18-win-x64,node18-linux-arm64 to create the executable?

This would be better to avoid js files interrupt your working space

tsc index.ts --outDir dist
rollup dist/index.js --file dist/bundle.js --format cjs",
pkg dist/bundle.js --out-path dist
"scripts": {
  "build": "npm-run-all build:*",
  "build:1": "tsc index.ts --outDir dist",
  "build:2": "rollup dist/index.js --file dist/bundle.js --format cjs",
  "build:3": "pkg dist/bundle.js --out-path dist"
}