npm / cli

the package manager for JavaScript
https://docs.npmjs.com/cli/
Other
8.34k stars 3.07k forks source link

[BUG] SBOM generation for CycloneDX generates duplicate dependencies #6967

Open jamietanna opened 10 months ago

jamietanna commented 10 months ago

Is there an existing issue for this?

This issue exists in the latest npm version

Current Behavior

The generated CycloneDX SBOM may not be able to be parsed by tools, as it generates duplicate dependencies.

Expected Behavior

A CycloneDX v1.5 SBOM generated from a repository can be parsed correctly.

Steps To Reproduce

  1. Clone https://gitlab.com/tanna.dev/renovate-graph
  2. Run npm sbom --sbom-format cyclonedx > cyclonedx.json
  3. Run through an Cyclone validator i.e. go run github.com/CycloneDX/sbom-utility@latest validate --input-file cyclonedx.json

renovate-graph.cyclonedx.json

Environment

//registry.npmjs.org/:_authToken = (protected)

; node bin location = /usr/bin/node ; node version = v18.17.1 ; npm local prefix = /home/jamie/workspaces/renovate-graph ; npm version = 10.2.3 ; cwd = /home/jamie/workspaces/renovate-graph ; HOME = /home/jamie ; Run npm config ls -l to show all defaults.

jkowalleck commented 10 months ago

did you experience the same issue when generating the SBOM via official tooling https://github.com/CycloneDX/cyclonedx-node-npm ?

jkowalleck commented 10 months ago

@bdehamer see my earlier remarks related to intrinsic impossible deduplication in node_modules: https://github.com/npm/rfcs/pull/714#issuecomment-1672927160

bdehamer commented 9 months ago

@jamietanna I'm digging into this issue and considering a couple different solutions. I'd be curious to hear which of these best meets the need of your SBOM use cases . . .

The Issue

In certain circumstances, it is not possible for npm to completely deduplicate packages in the node_modules tree. A basic example would be something like this:

demo-package@0.0.1
├─┬ foo@0.0.1
│ └── tslib@1.14.1
├─┬ bar@0.0.1
│ └── tslib@1.14.1
└── tslib@2.6.2

My demo-package project has dependencies on foo, bar and tslib (version 2.6.2). Since foo and bar each have a dependency on an older version of tslib (version 1.14.1) that is in conflict with the version needed by the root project, tslib@1.14.1 cannot be hoisted to top of the node_modules and ends-up being duplicated under both foo and bar.

Since version 1.14.1 of tslib literally appears on-disk at two different locations in the tree, the somewhat naive SBOM generator ends-up adding two identical entries to the CycloneDX components list.

This is why the resulting SBOM fails validation -- we end up with multiple entries which have identical bom-ref values.

Solution 1

One way to address this would be to treat each package that appears in the tree as a distinct dependency -- even if it is technically identical to some other dependency already present in the tree.

Given the example above, this solution would result in tslib@1.14.1 being listed twice in the SBOM, albeit with a distinct bom-ref value. We might choose to do something like prefix the bom-ref name the parent package name resulting in entries that look something like:

[
  {
    "bom-ref": "foo@0.0.1-tslib@1.14.1",
    "type": "library",
    "name": "tslib",
    "version": "1.14.1",
  },
  {
    "bom-ref": "bar@0.0.1-tslib@1.14.1",
    "type": "library",
    "name": "tslib",
    "version": "1.14.1",
  }
]

I believe that this is similar to the how cyclonedx-node-npm solves this problem.

Solution 2

The other approach would be to deduplicate that packages before adding them to the SBOM. Instead of literally mirroring the layout of packages in the node_modules directory, this solution would detect the multiple instances of tslib@1.14.1 and fold them into a single entry in the SBOM components list:

[
  {
    "bom-ref": "tslib@1.14.1",
    "type": "library",
    "name": "tslib",
    "version": "1.14.1",
  }
]

In this case, we're not trying to represent the layout of the node_modules directory, but instead just enumerating the distinct dependencies that comprise the project. This is how both cdxgen and the snyk SBOM command handle the issue of duplicate packages.


I think there are cases to be made for either of these solutions, but I'd like to know which of these best matches the output you'd expect to see in a valid SBOM?

savek-cc commented 4 months ago

It's been a while - but I'd strongly vote for option 2. If a technical identical dependency is included multiple times, it should only appear ONCE as a component in the SBOM. It can be referenced multiple times in the dependency section of the sbom though. Compare to what maven is also doing.