paketo-buildpacks / maven

A Cloud Native Buildpack that builds Maven-based applications from source
Apache License 2.0
32 stars 14 forks source link

Support ability to use cyclonedx-maven-plugin #334

Open xyloman opened 2 years ago

xyloman commented 2 years ago

Currently, Syft is used to generate SBOMs. The fidelity of the resulting SBOM is very low. It does not contain provenance information of included dependencies. This information is typically included in the case of CycloneDX when leveraging the CycloneDX maven plugin as a part of the build. Maven plugins usually have access to the entire dependency graph during build time which means the SBOM contains information that can only be discovered during build time:

Describe the Enhancement

Allow a maven build to contribute a CycloneDX document generated during build time.

Possible Solution

Java buildpack could discover a file such as ${project.artifactId}-${project.version}-cyclonedx.xml or ${project.artifactId}-${project.version}-cyclonedx.json and include it in the resulting cnb-sboms layer.

Motivation

CycloneDX documents generated during the maven build process will have access to more information about dependencies that can be fed into the bill of material CycloneDX document.

dmikusa commented 2 years ago

There are plans to add consistent support for CycloneDX output, although it's a low priority at the moment. The implementation would be through syft though and just having it output to CycloneDX format as well as the present formats.

The trouble with using a Maven plugin is that it only works for Maven & we have to support other build tools as well. We also want to provide a consistent prerequisite-free user experience (i.e. we don't want to require users to install certain Maven plugins).

That said, I wouldn't be opposed to adding an opt-in way for tools to contribute to the SBOM generation process and/or a way for the buildpack to back off and not overwrite tool-generated files.

I would also suggest you reach out to the syft community. If there is missing metadata, perhaps it can be added through syft. We run the tool against your source code, so it has full access to your pom.xml and everything Maven does. It's just a matter of the tool digging that information out.

loewenstein commented 6 months ago

Currently wondering if the pom cataloger would already improve the sbom quality? Or if not, whether the potential to configure use-network: true would?

loewenstein commented 6 months ago

Independently, I like the idea of defaulting to a syft scan based sbom of the workspace but allowing the sbom to be provided externally.

xyloman commented 6 months ago

In the past when this was reported the pom cataloger did a poor job of understanding transitive dependencies managed by a parent pom or a bom added to dependency management. This resulted in the wrong version being reported or missing dependencies. This might have changed since that time. The cyclonedx maven plugin has full access to the resolved dependency graph that maven executes off. This gives access to accurate license information, project source urls and checksums on all dependencies. Syft does not have access to this rich data set.

loewenstein commented 6 months ago

Thanks @xyloman. I tend to agree with @dmikusa though, that injecting plugins into the build system could turn out to be messy and hard to understand magic. I'd prefer giving users the option to include the plug in and configuring the buildpack to pick up the result instead of generating an own with syft scan.

xyloman commented 6 months ago

cdxgen might help with that messy outcome but ensure an accurate SBOM. I have concerns with accuracy issues out of the box that the user would be in aware of.

I disagree that running the plugin specifically via the mnw is not messy and is a common pattern in other build systems like GitHub actions and Jenkins jobs. It helps devs avoid adding plugins to their projects that are really only a concern of Platform Engineering.

dmikusa commented 6 months ago

I moved this to the Maven repo, as this discussion seems Maven-specific. We might be able to implement something Maven-specific, but I'd really rather see things consistent across build tools.

  1. If I understand you right @xyloman you're saying that we run the plugin but don't add it to pom.xml, which I'd overlooked and makes a lot of sense. Thanks for clarifying that.

    My concerns with running the plugin:

    • How would the plugin be downloaded and installed? We'd need to sort this out as we like to support use cases where there is no network to download things.
    • Adding time to the build. I'd like to see how much time this takes. Right now, syft is very fast so we can always run it. If this were to add seconds onto the build, we might need to implement RFC #0044 and require it be opt-in so we keep the builds fast.
  2. As far as taking an externally generated SBOM, I'm not sure that we would want to do that. Part of the benefit of having buildpacks generate the SBOM is that it's done in a way that's harder to tamper with (it's done as part of the build and put into the image itself, so if you change it the image changes too). If we take in something external, we're blindly trusting that and then reporting it as correct, which I think could be misleading. I think if you want to go this route, then it's probably best to just create and manage your SBOM outside of buildpacks. You can certainly just ignore what buildpacks generate and scan your app files and the buildpack generated image directly.

I still want to make this all pluggable, so if you don't like syft you can use another tool. We're doing some reworking of the buildpack libraries and SBOM changes will be part of that. I can't promise this is going to get us fully to pluggable SBOM generators, but it's a use case I'm keeping in mind.

loewenstein commented 6 months ago

2. As far as taking an externally generated SBOM, I'm not sure that we would want to do that. Part of the benefit of having buildpacks generate the SBOM is that it's done in a way that's harder to tamper with (it's done as part of the build and put into the image itself, so if you change it the image changes too). If we take in something external, we're blindly trusting that and then reporting it as correct, which I think could be misleading. I think if you want to go this route, then it's probably best to just create and manage your SBOM outside of buildpacks. You can certainly just ignore what buildpacks generate and scan your app files and the buildpack generated image directly.

I don't think that we have to blindly report it as correct @dmikusa. CycloneDX has some capabilities to provide sbom provenance I believe - didn't really read about it, but metadata.tools.[] comes to mind. The use case I have in mind is actually about SBoM data that is provided by the build system - that just happens to be a build system external to CNB. The way I picture this is, that for helpers we install and dependencies we download we are the experts when it comes to creating good quality SBoMs - for the application code base, we might be as well, in terms of selecting a default. But I think we have to accept and acknowledge that the developer might be the better expert.

dmikusa commented 6 months ago

@loewenstein I believe you're correct that a developer that's putting effort into generating their own SBOM can put out a more accurate SBOM than what we can do with automatic scanning. The automatic scanners will only get you so far, they all have their faults.

That said, I still don't think we should include what the developer or any external entity generates in the image as if buildpacks generated it. If you're getting it from pack sbom download, I think users will have an expectation that is provided by the buildpack. That means a.) secure/safe and b.) if there are problems with its contents, they'll complain to us and we'll respond. If we take something external and return it, that violates both a.) and b.).

Maybe if there is a very, very clear way to show that some part of the SBOM is generated externally we could include it, but even then I still think it's better done some other way, external to what buildpacks are doing.

I suppose if you really wanted to have your SBOM included in with buildpack SBOM data then you could create your own "External SBOM" buildpack, take in the external file, and then include that as the SBOM for that buildpack.

xyloman commented 6 months ago

The most accurate way to generate an SBOM is to use the build system to do so. The cyclonedx-maven-plugin provides the most robust and accurate cyclonedx SBOM for a maven project. There is similar capabilities in the gradle ecosystem. The issue here is a desire to balance an ecosystem concern with an accuracy concern. My proposal has been to tackle the accuracy concern because the ecosystem has provided a clear answer use the build system maven/gradle to generate the SBOM if accuracy is your biggest concern. In future versions of Spring Boot the maintainers have decided that if a plugin such as cyclonedx-maven-plugin is included in the pom.xml then the sbom generated by it will be picked and exposed via the SBOM accutator endpoint. If buildpacks leverage a different mechanism (e.g. syft) then the app will present a different SBOM than the SBOM returned by buildpacks. I am advocating that buildpacks should allow the buildsystem to contribute an SBOM. If the SBOM is placed in a well known location, leverage it over syft.

dmikusa commented 6 months ago

@xyloman So I'd like this to eventually be pluggable so people can choose their SBOM provider. There's nothing particular about Syft, aside from that's where we started.

I've heard a little about what Spring Boot is doing with SBOM and it looks cool. We'll have to discuss how that specifically is integrated, or if we do. Since it's available through Actuator, you don't really need it available in the buildpacks layers too. I do agree that having it plus the buildpack's Syft (or another tool) generated SBOM, which could be different, would be confusing. So we'll have to sort that out, maybe we just turn off SBOM generation if Spring's going to do it.

My question for you would be, does it matter how you get the SBOM information? If it's in the image and you can get it from Actuator or if it's in the image and you run pack sbom download? Or do you prefer having it outside the image?

I'm also curious how you verify your SBOM to make sure it hasn't been altered. Do you sign them with some tool? If so, what tool?

Thanks!

loewenstein commented 6 months ago

@dmikusa Yeah, I think the ability to suppress an SBoM for user provided code is the minimum that we have to provide - maybe you are right and there's no need to inject it into the image though.