nodejs / primordials-use-cases

To discuss primordials use cases
0 stars 1 forks source link

V8 private symbols #4

Open jdalton opened 5 months ago

jdalton commented 5 months ago

This issue is less of a use-case and more of an FYI on a possible approach.

As we know Prototype pollution injection attacks are a common JavaScript issue with CVE after CVE for years now. A recent Node security update from Feb 2024 was related to it as well (Buffer.prototype.utf8Write).

In my experience with WebKit (it's something I'm more familiar with than V8 at the moment) I found that expanding on WebKit's private symbols (properties and intrinsics prefixed with@ instead of V8's%) was totally doable to cover all primordial cases. So for built-in Node code with V8-like private symbols in mind (by way of patching V8 or otherwise) things could look like:

someArray.%push(a, b, c)

optionally using $ or _ as a stand-in prefix for formatters/editors then later through the code-gen/build step transformed to the V8 private symbol % prefix:

someArray.$push(a, b, c)

Even things like this could be possible:

%Buffer.%prototype.%utf8Write
someBuffer.%utf8Write(string, offset, len)

class Foo {
  [%%iterator]() {
    return this;
  }
}

%Map.%__lookupGetter__(%%species)
%Map.%prototype.%%iterator
%RegExp.%prototype.%__lookupGetter__(%dotAll)

Because the private symbol values are references to builtins they are treated just like snapshotted forms except you can call them in a more straight forward way (without uncurrying). These would also only be available to the privileged Node core JS internals (not accessible to user-land code).

No separate realms or VM instances, no cherry-picking/plucking of properties, no indirection through uncurrying. Fast, expressive, and direct access to tamper-proof versions of these methods, properties, and symbols.

joyeecheung commented 5 months ago

We already have a handful of private symbols used in core. However generally primordials aren't considered a security measure, because the threat model of Node.js trusts code that can be loaded into and run in the process (if the code that is allowed to run is malicious, doing a quick child_process.spawn or process.dlopen() and do whatever they want with the process would be way more effective than spending all the effort polluting the prototype just to control some internals). Historically monkey-patching is also heavily relied upon by the ecosystem (even to this day, especially in the CJS loader and fs APIs) and if we cannot break them all due to compatibility concerns, then there are always loopholes possible via pollution, making the effort half-baked at best.

The last consensus we have on primordials is

Use primordials in the error path only if there are no significant perf regression.

They are only used as a UX enhancement to avoid surprising crashes from internals when users monkey-patch the prototype incorrectly, not a security measure.

IMO the most viable way to prevent being affected by pollution in security is just moving sensitive paths into native (for example, in the case of path.resolve() for permission model, we just added native Path::Resolve() and used that instead). We usually get performance hit and error stack noises in many paths that could've been written in native, and the most convincing motivation to do them in JS is to lower the barrier to contribution, but introducing special syntax that needs additional processing or even just invoking prototype methods differently hurts readability and developer experience enough that it can just tip the scale and make writing internals in JavaScript less sensible in the first place.

jdalton commented 5 months ago

@joyeecheung

However generally primordials aren't considered a security measure, because the threat model of Node.js

Yes, "Security" is a trap word. If you don't think it's security that's fine. I didn't call it security.

Historically monkey-patching is also heavily relied upon by the ecosystem (even to this day, especially in the CJS loader and fs APIs) and if we cannot break them all due to compatibility concerns, then there are always loopholes possible via pollution, making the effort half-baked at best.

I didn't suggest stopping all monkey-patching. I'm well aware. I actually think it's a strength of CJS. What I'm suggesting is that private symbols are a step better than the existing approach. It allows mixed use (of say .%push and .push) even in files related to module loading (so doesn't have to be separated by file or function).

IMO the most viable way to prevent being affected by pollution

Native is totally great for some things and for some things JS implemented methods are nice. JS engines have a balance of the two as well. In the cases where there is JS being used for the implementation JS engines reach for private symbols and I think in those cases Node should too.

GeoffreyBooth commented 5 months ago

Would it be possible to show how this would work in concrete terms? For example, a PR that takes one small file that currently uses primordials and refactors it to use this new style instead.

joyeecheung commented 5 months ago

In the cases where there is JS being used for the implementation JS engines reach for private symbols

V8 used to have self-hosted JS but removed them due to problems with transitions. The private symbols are generally used to access user-declared/modifiable things (e.g. error stack, class initializer function), not complete internals. For Node.js I think it makes more sense to implement things in a similar fashion. There are less sensitive, higher-level things that are better implemented in JS (e.g. HTTP) but IMO it's fine to leave them un-primordialized. Node.js core being primordialized only provides limited guarantee while the vast majority of npm packages don't do it, and we also vendor in libraries that don't do it as Node.js continues to grow (and there's benefit in leaving them out of Node.js core too because they can have separate releases that work on other runtimes or older versions of Node.js).

What I'm suggesting is that private symbols are a step better than the existing approach.

If there is someone volunteering to implement the source transform, working on distributing the source maps in our tarballs, and updating our existing tools like linters to support them, that might be worth a shot, but I doubt how many would be up for take on that work and maintain them. It just seems more cost-effective to follow what V8 does and move the sensitive code paths into native than adding a build step.

jdalton commented 5 months ago

@GeoffreyBooth

Would it be possible to show how this would work in concrete terms? For example, a PR that takes one small file that currently uses primordials and refactors it to use this new style instead.

Peruse some of the JS files in the following: https://github.com/v8/v8/tree/6.0.1/src/js https://github.com/WebKit/WebKit/tree/main/Source/JavaScriptCore/builtins

@joyeecheung

V8 used to have self-hosted JS but removed them due to problems with transitions.

WebKit has avoided this issue with special methods to allow putting direct properties on objects without transition.

If there is someone volunteering to implement the source transform, working on distributing the source maps in our tarballs, and updating our existing tools like linters to support them, that might be worth a shot, but I doubt how many would be up for take on that work and maintain them.

As alluded to in the original message for this issue the .js source containing private symbols could use a stand-in prefix that is then trivially replaced (with regexp). This allows existing tools to support them.

Since part of these initiatives are looking at what to ask from outside entities like the TC39. This could also be a time to think of what to ask for by way of V8 if a mixed Native + JS approach continues too.

joyeecheung commented 5 months ago

If it can be standardized in the language and we can get it for free it will definitely be a lot better than the existing non-standard primordials that we are using. If it's not free I doubt if there will be enough resources to get it implemented and maintained in Node.js, given that most contributors these days aren't familiar enough with the build tool chain or the build infrastructure to make this kind of changes, and the very few that still know this part of the project either probably don't have enough cycles for this or aren't active anymore. Moving more things into native would just be more cost-effective.

jdalton commented 5 months ago

@joyeecheung

If it's not free I doubt if there will be enough resources to get it implemented and maintained in Node.js, given that most contributors these days aren't familiar enough with the build tool chain or the build infrastructure to make this kind of changes, and the very few that still know this part of the project either probably don't have enough cycles for this or aren't active anymore.

I can see that being a tough constraint. A solution there would likely require a patch + a bit of tooling.

Moving more things into native would just be more cost-effective.

Okay! On that front (porting JS to native) I have found AI to be helpful (it messes up a lot but gets the skeleton of things close) so that things are almost line by line (side by side) similar.