safeguarding code changes done by wasm binary

ChALkeR commented 3 months ago

Perhaps adding an additional safeguard on the transform output could increase trust in the shipped base64 wasm file

PoC on top of the package public API (but embedding that inside the package might avoid double conversion)

const transformedBuf = Buffer.from(transformed)
if (sourceBuf.length !== transformed.length) throw new Error('length mismatch')
for (let i = 0; i < transformedBuf.length; i++) {
  // should match either the source buffer or spaces or semicolon: https://github.com/swc-project/swc/issues/9331
  const val = transformedBuf[i]
  if (val !== sourceBuf[i] && val !== 0x20 && val !== 0x3b) throw new Error('result mismatch')
}

That seems to work on simple examples, I wonder how something like this (minus the extra buffer<->string conversions) would affect perf

marco-ippolito commented 3 months ago

I mean we compile rust and build the wasm manually, so if there is anything suspicious we see it in the rust code when we update swc https://github.com/nodejs/amaro/tree/main/deps/swc

This is the wasm build: https://github.com/nodejs/amaro/blob/main/tools/build-wasm.js

ChALkeR commented 3 months ago

trusting/verifying the build process and verifying the result on the fly are different levels of guarantees, which are not mutually exclusive and should be complementing each other, ideally

so the main question is if there is a measurable perf hit on this / what would be the costs of such a safeguard

marco-ippolito commented 3 months ago

I wonder what is safeguarding against specifically? Can you describe an use case?

ChALkeR commented 3 months ago

@marco-ippolito e.g. a supply chain attack on upstream swc embedding malicious js into generated output which gets past source audit when updating the binary

this check would make it significantly harder to do anything bad with that

marco-ippolito commented 3 months ago

@marco-ippolito e.g. a supply chain attack on upstream swc embedding malicious js into generated output which gets past source audit when updating the binary

this check would make it significantly harder to do anything bad with that

If a supply chain attack happens in swc, when we update swc, we have a pr with the changed rust source code. We dont use swc javascript. Then someone has to manually compile the wasm and open a pr to update the generated js in amaro. The inline wasm is generated by us from the rust source code, not from upstream. If a malicious attacked opened a pr with malicious wasm, the safeguard would not be useful

ChALkeR commented 3 months ago

cc @nodejs/security-wg (also perhaps @nodejs/security-tsc)

ChALkeR commented 3 months ago

If a supply chain attack happens in swc, when we update swc, we have a pr with the changed rust source code.

Yes, and I'm not entirely sure we can give a 100% guarantee that each line in each change in rust code will be properly audited each time we update swc (we can't give a 100% guarantee to anything actually)

(Was the current impl audited btw?)

We dont use swc javascript

The issue is with rust swc possibly injecting malicious js which node then executes

Then someone has to manually compile the wasm and open a pr to update the generated js in amaro

Not a safeguard which is strong enough

The inline wasm is generated by us from the rust source code, not from upstream

See issue above (about changes audit process)

If a malicious attacked opened a pr with malicious wasm, the safeguard would not be useful

Why? This is a very clean/minimal check which could be additionally covered with codeowners Which interfaces are available to the wasm binary?

What are the actual downsides? (if perf won't be significantly affected?)

ChALkeR commented 3 months ago

it looks like there are more whitespace chars possible to mask multi-byte symbols, but still a very limited set

marco-ippolito commented 3 months ago

to be honest we have several other dependencies that we use as wasm:

We build the wasm from the source code in the organization, so we are well aware of the source code. Just like we compile dependency code, in the deps folder, code changes are carefully reviewed. SWC is trusted and also powers other runtimes and projects such as Deno, rspack etc... I understand your concerns but I believe this is an issue that should be discussed with the @nodejs/security-wg and it would be quite unreasonable to block v0.0.5 based on this concern, expecially given that this is an early stage feature that requires bootstrap work.

aduh95 commented 3 months ago

IIUC, sourceBuf would be the user-code (e.g. the content of index.ts) and transformed is the output of SWC stripping types out of sourceBuf, correct? I like this idea, as long as --experimental-stip-types does not apply to node_modules files, we can assume it's only used for developing, in which case it's OK to have it a bit slower if that improves the confidence.

aduh95 commented 3 months ago

it would be quite unreasonable to block v0.0.5 based on this concern

I definitely agree, it should block the unflagging of the feature (or worst case scenario the landing of the experimental flag on a release line), but there's no reason to block updating from 0.0.4 to 0.0.5 as the concern applies to both versions.

ChALkeR commented 3 months ago

@aduh95 I was not suggesting to block 0.0.4 -> 0.0.5 update based on this, that one is blocked on build process not being reproducable or transparent

This is a separate issue

ChALkeR commented 3 months ago

Filed a PR at https://github.com/nodejs/node/pull/54141

bcheidemann commented 3 months ago

to be honest we have several other dependencies that we use as wasm:

undici https://github.com/nodejs/node/blob/main/deps/undici/src/lib/llhttp/llhttp-wasm.js

cjs-module-lexer https://github.com/nodejs/node/blob/main/deps/cjs-module-lexer/dist/lexer.js

As an upstream user of Node, it's concerning to hear that this is common/accepted practice. Particularly in light of the xz attack.

We build the wasm from the source code in the organization, so we are well aware of the source code.

This is not something I can verify as an end user.

Just like we compile dependency code, in the deps folder, code changes are carefully reviewed.

Also not something I can verify. This can change over time as well.

SWC is trusted and also powers other runtimes and projects such as Deno, rspack etc...

I think the point is that it's possible for a modified (malicious) version of the SWC WASM to be merged to main, even if SWC is clean. The Binary is updated and committed manually, and there's no automated verification step before publishing.

FWIW, the way Deno is shipping SWC is very different to this. It is, I believe, much less prone to supply chain attacks.

marco-ippolito commented 3 months ago

@bcheidemann I agree there is a lot of work to do to improve security against supply chain attacks, in this case we build the wasm from sources and with cargo.lock, but there is obviously a lot more to do, like to start an automated build step. Any help is very welcome

bcheidemann commented 3 months ago

@bcheidemann I agree there is a lot of work to do to improve security against supply chain attacks, in this case we build the wasm from sources and with cargo.lock, but there is obviously a lot more to do, like to start an automated build step. Any help is very welcome

That's great to hear. I have a long flight next Thursday and will need something something to fill my time 😅 Perhaps, with your blessing, I could take a look at automating the build step?

mhdawson commented 3 months ago

@bcheidemann to add some additonal context, for most dependencies there is an update script in https://github.com/nodejs/node/tree/main/tools/dep_updaters which is used to generate PRs to update dependencies.

So automating the build step along with automating the PR update generation are two related tasks that help would be welcome with. From my perspective for the dependencies with WASM the end goal should be that we build from source in the deps/amaro directory in the node.js project.

bcheidemann commented 3 months ago

@marco-ippolito @mhdawson I have submitted a draft PR (#48) which I believe would address some of the concerns discussed in this issue. It's not fully tested because my plane wifi is bad and not playing ball with Docker. However, I've included a detailed writeup of the changes I've made and why, so any feedback on this would be greatly appreciated 🙂

From my perspective for the dependencies with WASM the end goal should be that we build from source in the deps/amaro directory in the node.js project.

@mhdawson I had a look at the current dep updaters (thanks for the link!). I think I agree that building from source, as you suggest, would be preferable. However, at this point I can think of a few different ways that we could achieve this goal, and I'm not sure which would be best. Anyway, I'd be happy to help how I can with this, and I'm sure discussion on #48 will help illuminate the correct path.

nodejs / amaro

safeguarding code changes done by wasm binary #17