interlynk-io / sbomasm

SBOM Assembler - A tool to edit SBOM or assemble multiple sboms into a single sbom.
Apache License 2.0
57 stars 4 forks source link

Duplicate packages after merge #101

Open vargenau opened 3 months ago

vargenau commented 3 months ago

example6-src.spdx.txt example6-lib.spdx.txt merge.spdx.json

sbomasm assemble -n merge -v 1 -t "application" -o merge.spdx.json example6-*.spdx 

Both example6-src.spdx and example6-lib.spdx contain identical packages, go.reflect and go.strconv.

In the merge, these packages are present twice.

I would expect no duplicates.

In real-world examples, I have many duplicates.

viveksahu26 commented 3 months ago

Thanks for raising this issue. Will get back to you :+1:

riteshnoronha commented 3 months ago

@vargenau yes as mentioned in our readme, we do not remove duplicates, but if that is a requirement we will need to add a mode to each merge algo to remove duplicate components.

A potential Algo to identify duplicates would be

  1. PURL match
  2. CPE match
  3. Name-Version match
  4. Checksum match We would execute these checks in a sequence, whichever matches indicates its a duplicate and eleminate it.

Thanks for a feature request will work on this.

vargenau commented 1 week ago

Any progress on implementing this?

riteshnoronha commented 1 week ago

@vargenau we implemented this for CycloneDX, will move over the logic for SPDX by next release.

vargenau commented 1 week ago

Very good, thank you!