Open marco-ippolito opened 1 year ago
IIRC besides the SLSA + SigStore work I think @BethGriggs also have looked to SBOMs, right? Would you mind sharing your point of view?
I had some initial thoughts, but didn't get too far. Some of them:
npm
and actions
dependencies.
node_modules
and generate the SBOM from that. Our mixture of runtimes/languages used complicates things.node
they're using depends directly on dependency x
from this source, and can feed it into tools that monitor their SBOMs, etc. deps
directory in our sources may not match what is actually built.process.versions
. It could be a reasonable interim step. It felt a bit odd (even risky?) to rely on executing the software to determine what's being used in it. I feel doing it at the build stage would allow us to gather more detail (which source was used) and verification rather than just reporting versions.I see that CycloneDX is quite popular, should we give it a try? What kind of tool should we use?
+1 for CycloneDX
So I gave it a try on my machine and unfortunately my macbook went OOM and crashed. Since Node is a fairly large project it's an expensive operation that falls into the case described by documentation: https://github.com/CycloneDX/cdxgen/blob/master/ADVANCED.md#use-atom-in-java-mode
I was wondering if it was possible to have access to a machine with 32/64gb of ram to run it.
I think the only machines we have that have that much memory might be:
test-nearform_intel-ubuntu2204-x64-1 test-nearform_intel-ubuntu2204-x64-2
I'd suggest you open an issue in the build repo to request access to one of those.
The ideal goal is to ship a SBOM for every executable we release, since every platform might have slight difference settings, tools, dependendencies (? I'm not sure this is true). I guess it should eventually, be included at build time as a release step, @RafaelGSS.
It is also possible to generate the SBOM starting from a csv file manually, which might be easier and less expensive in terms of computing but hard to maintain, not big fan of this idea.
Also we should define a end goal for the project in terms of SBOM quality https://scvs.owasp.org/scvs/v2-software-bill-of-materials/ I assuming we start from the basic
My idea is to start quick with https://github.com/CycloneDX/cdxgen which is a "generalistic" tool and then refine and improve quality with further developments and more specific tools
So I gave it a try on my machine and unfortunately my macbook went OOM and crashed. Since Node is a fairly large project it's an expensive operation that falls into the case described by documentation: https://github.com/CycloneDX/cdxgen/blob/master/ADVANCED.md#use-atom-in-java-mode
I was wondering if it was possible to have access to a machine with 32/64gb of ram to run it.
The ideal goal is to ship a SBOM for every executable we release, since every platform might have slight difference settings, tools, dependendencies (? I'm not sure this is true). I guess it should eventually, be included at build time as a release step,
Dependencies are the same for the platforms we currently release. Tooling (compilers, Python, etc) do differ.
If CycloneDX requires that amount of RAM to run for Node.js it's not going to be realistic to run on every platform we release on. Most of the release machines have 4GB RAM (some have 2GB+swap and a small number have 8GB).
@marco-ippolito repasting my post from the CycloneDX chat:
cdxgen is a good start! For a large codebase like node.js, here are my extra 2 cents:
IMHO your problem is not so much npms or pypi that are easy to inventory because they have package manifests, but the rest of the C/C++ code and its deps that are vendored or not but have no manifest, like zlib, cares, and similar and their nested and bundled deps all the way down (like in V8)
You may document their origin and licenses in the codebase. I use small YAML files for this, you could use a small CycloneDX SBOM to the same effect. Conceptually something like this https://github.com/nodejs/node/blob/main/deps/zlib/README.chromium#L1 but improved to have proper Package URL/purls. This will get you an explicit list that you can then have scanners collect in addition to the simpler npms or Python package.
Or you might want to match against a reference index of C/C++ packages for these too, in which case you need a code matching tool and a reference DB. Or do a combo of 2. and 3. which is best IMHO. Then eventually you will need to craft and run a custom pipeline assemble data from a few different tools and origins to get something that is tailored to node.js
You may want to consider also analyzing the deployed (debug) binaries rather than the sources code to craft an SBOM that is based on the subset of the sources effectively used. This is effectively what users and security teams will care for, not the (many) other development-only packages that are not deployed
You really want to get proper Package URLs/purls in your CycloneDX output for this to be useful for downstream users when querying for vulnerability in modern databases. If you have a few CPE that will not hurt either!
This is a process. Do not expect to get any open source or commercial tool to get you the correct results out of the box. This will require tuning and a custom pipeline to automate all this. And the output of running this pipeline will require regular review for accuracy.
I have some experience in the domain and I may be able to help modestly.
@BethGriggs re: https://github.com/nodejs/security-wg/issues/1115#issuecomment-1729635778
I came to the conclusion the SBOM should really be generated at build time. This is because some of our dependencies can be externalised or swapped out during the build step (for example, building against a system OpenSSL). What is in the deps directory in our sources may not match what is actually built.
:100: ... if you can instrument your build to collect the subset of third-party code that you effectively include (and possibly external deps that may be expected at runtime), then this is IMHO the best possible case and something that I would always recommend.
@pombredanne so my idea to get started is :
/deps
folder for npm packages,Would you suggest some tools for your point 3 and 4? or some reference
I will work on improving the performance of cdxgen/atom for the c/c++ codebase. It has to be done regardless of whether node.js becomes a user or not. My initial focus would be on the time to reduce it to less than an hour for v8. Reducing the memory footprint to make it run in a CI agent for such large codebases is impossible, so it is not going to be my priority this year.
@marco-ippolito I like your proposal to generate individual SBOMs per folder in deps. CycloneDX supports linking SBOMs using BOM-Link under external references.
I have created this ticket to automate this process a bit. Once all the performance tickets are done, I am happy to share an example workflow with the right arguments needed to generate these.
@marco-ippolito you wrote:
so my idea to get started is :
1. run cdxgen for each package in `/deps` folder for npm packages, 2. run cdxgen for tools and github actions 3. document their origin and licenses for V8 and OpenSSL and other c++ dependencies
Would you suggest some tools for your point 3 and 4? or some reference
I suggest you get something started first with your plan.
For 3 and 4 I have some bit that are work in progress at https://github.com/nexb/elf-inspector and https://github.com/nexB/purldb/ ... scancode-toolkit also has some code to collect metadata from the README.chromium files used to document the metadata.
BTW, are there debug builds with debug symbols available? (with DWARFs for Linux and macOS and a PDB for Windows)
cdxgen 9.9.2 was released with the required improvements. Will share an example workflow that will do both 1 and 2. (Aiming for single invocation). For 3, cdxgen currently supports vcpkg.json format to share additional metadata. You can create this file within the various folders, and the information will be used in the generated SBOM. Will also share some examples of this as well.
I'm wondering which installation method should we use on our machine, link to guide guide
I'm wondering which installation method should we use on our machine, link to guide guide
npm install with Java 21 must work. For CI, we can have a workflow that sets up the prereqs.
I was reading about the possibilities to use SBOM in Docker images, and it seems that is possible using docker sbom
or docker buildx build --sbom=true -t <myorg>/<myimage> --push .
This might be a good option for the Docker Official images. What do you think?
References
IMO, CycloneDX is the way to go (as it becomes an Ecma and hopefully an ISO standard with v1.6 due in Feb.) and will need to eventually have their specified ability to declare (quantum) crypto information and actual attestations as consumers are able to produce them.
This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.
This issue has been inactive for 90 days. It will be closed in 14 days unless there is further activity or the stale label is taken off.
Im still interested in this
It seems the bot isn't recognising the never-stale
label
This issue has been inactive for 90 days. It will be closed in 14 days unless there is further activity or the stale label is taken off.
I think it would be great to have a SBOM for the project now that we are working on dependency build audit. Probably investigate on how we can achieve this since we have different types of dependencies and which format.