Modify Spec to Require Artifact Digest in PURL

tpletcher-hpe commented 1 year ago

Preface: Over the past few years large strides have been made in moving OSS software artifact provenance tooling forward. OpenVEX is a continuation of that effort, and we need to make sure that steps like OpenVEX stay true to the core objective of specifically deterministic artifact identification, i.e. cryptographic identity. You only need glance through the NVD to see string based (name) association of vuln to artifact is a non starter. CISA's intent with VEX is really to provide an accurate clearing house for software consumers of ongoing artifact maintenance. So in the short term this proposal is dirt simple:

Fast path: simply require that the PURL contains a proper digest from the source registry as outlined in the PURL spec here: https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst

It's acknowledged that the formats of certain package management implementations may have to be adapted to accommodate this requirement, but it should be a requirement, not an option.

garethr commented 1 year ago

I was going to open a similar issue for discussion.

Currently the spec reads:

The use of Package URLs (purls) is recommended

This ultimately means products could be anything. I think what happens is bifurcation in tools/usage, between those that use/assume purls (and can further process the information from the list of packages) and those that don't (where it's just a string that could be displayed to a human). That will lead to errors in higher level tools which basically say:

Well, yes that is a valid openVex document, but I'm not doing to use it because it doesn't use purl.

That bleeds what should be an implementation detail out to users. While this arguably broadens the use cases, I'm not sure it's the best path to adoption, as it's hard to step back from later.

Personally, I think I'd argue to:

Require purl when describing packages
Collect additional use cases, and either extend purl or create similar url-based specs for identification

garethr commented 1 year ago

Another angle, another approach might be to be explicit that something is a purl:

"products": [
   {"purl": "pkg:apk/wolfi/git@2.39.0-r1?arch=armv7"},
]

This would allow parsers to avoid the trouble of parsing strings that no one thinks are purls to begin with to see if they might be.

It would also allow for other (agreed) identifiers:

"products": [
   {"purl": "pkg:apk/wolfi/git@2.39.0-r1?arch=armv7"},
   {"cpe": "cpe:2.3:a:1password:1password:1.0.0.36:*:*:*:*:windows:*:*"}
]

This would also support multiple explicit identifiers for the same thing.

There is an overhead ("cpe": "cpe:", "purl": "pkg":") but making it a first class part of the JSON document, vs parsing the contents, might be worth it.

luhring commented 1 year ago

another approach might be to be explicit that something is a purl

I like this idea — much more explicit and avoids frustrating debugging for users. I also like that we can identify the same component in multiple ways. We could have a strong link to an SBOM element, for example, that includes rich information about the component's composition. But for consumers without access to that SBOM, a PURL (or other simple identifier) would be better than nothing.

luhring commented 1 year ago

I was going to open a similar issue for discussion.

Good call — I think there are two distinct ideas here: requiring digests in PURLs, and being explicit about the type of package identifier. I split out the latter into #16!

luhring commented 1 year ago

Re: requiring digests in PURLs:

I like the goal of cryptographic identity. Before we make this a requirement, I'd want to understand the impact this would have on various use cases of OpenVEX.

For example, how should OpenVEX tools handle a case where a VEX statement has a digest for a package but the given vulnerability report or SBOM doesn't have a digest? Suppose two PURLs (say, one from an OpenVEX doc, and one from a vulnerability report) match in every way, except one is missing a digest. Should the OpenVEX tool conclude that the VEX statement applies to this vulnerability match, or not?

It's acknowledged that the formats of certain package management implementations may have to be adapted to accommodate this requirement

I almost think this needs to be accounted for in the PURL spec itself as a first step here. In the PURL spec, I don't see a mention of digests for any ecosystems except OCI. It seems a bit awkward to me for a spec like OpenVEX to assert more requirements on the PURL spec than other PURL users in the ecosystem might comply with, or even be aware of.

Let me know if I'm thinking about this wrong... 😃

tpletcher commented 1 year ago

Just a few follow-on's for my 50cents worth...

Personally, I think I'd argue to:

Require purl when describing packages
Collect additional use cases, and either extend purl or create similar URL-based specs for identification

@garethr On PURL's generally: OpenVEX as a broader spec should be capable of accommodating multiple package identification schemes, as may be present in different ecosystems today, but also those that have not yet been designed/implemented as this space is moving at light speed and will continue to for sure across the board. However, regardless of whether it's in URL itself or in payload body, the one binding requirement across all package identifier schemes is that the proper identifier (some form of cryptographic) is present for the package from day one, i.e. require the digest now, don't backslide.

For example, how should OpenVEX tools handle a case where a VEX statement has a digest for a package but the given vulnerability report or SBOM doesn't have a digest? Suppose two PURL's (say, one from an OpenVEX doc, and one from a vulnerability report) match in every way, except one is missing a digest. Should the OpenVEX tool conclude that the VEX statement applies to this vulnerability match, or not?

@luhring Dealing with two packages one with, and one without cryptographic identifier, I think is initially (for some short but reasonable period of time) log occurrence and flag the imperfectly identified record as such(notifications to owner, etc), and then ultimately to explicitly place it in an "un-trusted state" where it is permanently reported and potentially excluded from ongoing analysis until such time as it is remediated/removed. So carrot first, then stick.

Net-net, hold everyone to the higher standard now so the muscle memory is there from day one to do the "right" thing mechanically in the production of the artifact. I think getting the first pass formatting is pretty straightforward after that decision is made and a modular approach to package identity system is added to the spec. I might also assert that doing the right thing on topics like this is important as it's a prime mover effort around VEX in general.

Thanks again for thoughtful comments. Lemme know what I can do to assist going forward.

puerco commented 1 year ago

In the recent discussion to expand the product field, we choose to let OpenVEX be as expressive as possible as we expect more specific identifiers coming from subjects looking for a match in a document that should have a wide breadth of identifying schemes.

This resolves this issue. We're not requiring hashes but as of spec v0.2.0 they are now first class citizens in the VEX product and can be defined together with other identifiers.

I'm closing this issue but feel free to reopen it if you'd like to discuss the matter further or create new ones. Thanks for your feedback!

tpletcher-hpe commented 1 year ago

All good. Thanks for update!

openvex / spec

Modify Spec to Require Artifact Digest in PURL #10