Closed SteveLasker closed 2 years ago
If purl is always about location, then it makes consuming public content, in a secured & reliable manner, problematic as the same content will be available from multiple locations, and users want to pull the content into their private networks.
Exactly. Which is why CycloneDX is heavily focused on security use cases, provenance being one of them. It's important to know where something was retrieved from, even if it was an internal mirror. When software is built/assembled, I'm not aware of any use case where the same artifact is retrieved from multiple repos and used. Just because something CAN be retrieved from multiple sources, doesn't mean it was. This is also where CycloneDX and SPDX vary dramatically in scope. As a pure BOM format, CycloneDX cares about what actually transpired, whereas SPDX (which I would not classify as an SBOM format, but it can be used for SBOM use cases) describes what something COULD be. A look at SPDX external references is all that's required for that to become obvious.
Internal repo servers (most of them) do not support:
So although I can specify my internal repo in which I retrieved something from, many repo servers do not provide the full transparency necessary to achieve these basic requirements.
See also: https://owasp-scvs.gitbook.io/scvs/
Is it possible to amend purls scope to assure unique identity, and make location an optional parameter so it could be used reliably for SBoM and security scan result pointers?
Purl is already heavily used in SBOM use cases today with 100K+ CycloneDX adopters - most of which utilize purl. So I think we have to better understand what specific SBOM use case is not being addressed today. As far as security scan result pointers
, I would think location would be highly important here. The NVD is mostly irrelevant for identifying vulnerabilities in libraries today. Many SCA vendors either use purl directly or have some proprietary alternative which takes identity, location, and other metadata in mind when identifying known vulnerabilities in components. The NVD became mostly irrelevant because CPE could only describe vendor, name, and version. Purl goes down to the module level which is much more granular than what we had previously (and supported by the likes of Sonatype and Snyk). But we have the opportunity to further improve on that by incorporating location into the equation. For example, if I have a Java component that's published to Jitpack and the same artifact that's published to Maven Central and the one on Maven Central is the only one affected by a known vulnerability, that's really interesting information and a competitive advantage for the source of vulnerability intelligence that can go down to that level.
I do think however, there's an opportunity for an organization to "opt out" of using location by supporting a way to specify no default repo and no repo url. This might be useful for private repos. If an organization wants to practice security through obscurity, this would provide them a way to achieve that, but I would recommend this be an opt-in feature as we would not want to cripple location for the majority in favor of the few.
If purl is always about location, then it makes consuming public content, in a secured & reliable manner, problematic as the same content will be available from multiple locations, and users want to pull the content into their private networks.
Exactly. Which is why CycloneDX is heavily focused on security use cases, provenance being one of them. It's important to know where something was retrieved from, even if it was an internal mirror. When software is built/assembled, I'm not aware of any use case where the same artifact is retrieved from multiple repos and used. Just because something CAN be retrieved from multiple sources, doesn't mean it was.
I'm curious how CycloneDX users will be able to find the correct endpoint within their intranet if they are using a docker installation configured to use a mirror, or a go installation that uses an internal proxy. What if they invoke a CLI tool which invoke the tools which do the fetching after several hops?
So although I can specify my internal repo in which I retrieved something from, many repo servers do not provide the full transparency necessary to achieve these basic requirements.
I don't think cloud native repos as the exist right now (mostly backended by S3 buckets) provide that kind of transparency either. All the user sees is a front facing API with no visibility into where exactly the artifact comes from. In fact, in most cloud native environments, folks don't care where the artifact is located as long as its integrity can be verified and it is signed.
I do think however, there's an opportunity for an organization to "opt out" of using location by supporting a way to specify no default repo and no repo url. This might be useful for private repos. If an organization wants to practice security through obscurity, this would provide them a way to achieve that, but I would recommend this be an opt-in feature as we would not want to cripple location for the majority in favor of the few.
As far as I can tell, this is what @iamwillbar was suggesting by making the repository
identifier "strongly recommended" in his comment on the OCI proposal.
I'm curious how CycloneDX users will be able to find the correct endpoint within their intranet if they are using a docker installation configured to use a mirror, or a go installation that uses an internal proxy. What if they invoke a CLI tool which invoke the tools which do the fetching after several hops?
@nishakm The answer is in the question. Since CycloneDX has a data model optimized for highly automated pipelines, it’s elementary to enhance, correct, or merge SBOMs during the execution of the pipeline. Inspecting the configuration to discover use of a mirror and correcting purls in the SBOM is quite simple.
I believe Maven is one of only a few dependency management systems that also provide information on what repository each and every artifact was retrieved from. Most package managers are immature by comparison. But we should not see the immaturity of other systems as a reason to diminish the default behavior of purl.
In fact, in most cloud native environments, folks don't care where the artifact is located as long as its integrity can be verified and it is signed.
You’ve just described how SolarWinds happened - blind trust in something without transparency or methods to validate. We should not be interested in promoting practices that support continued use of bad practices. We need to support efforts that promote further transparency, even if it’s difficult for some ecosystems to achieve today.
As far as I can tell, this is what @iamwillbar was suggesting by making the repository identifier "strongly recommended" in his comment on the OCI proposal.
@rnjudge I could support the addition of a way to opt out or otherwise specify the location is unknown or not disclosed. I am not in favor of making location strongly recommended for the core purl spec as that one change would alter the meaning of every purl being used today. It’s a small, but breaking change.
Pinging @pombredanne for feedback.
Given @SteveLasker specifically references SBOM use cases I think this is a non-starter.
Unless maybe if you are only using purls in SBOMs for intellectual property use cases?
Where a package was retrieved from is important for software supply chain security use cases.
The component might be the same on disk. But the provenance is quite different. And, if you are trying to look at supply chain risk, this information is important.
I think it would be more beneficial to identify what Steve L thinks is missing from the existing purl format. If it's just a case of being able to remove the location information surely that can be done by the consumer when parsing?
I think I understand @stevespringett's concern that the problem is differentiating what the location is from what the location could be. Therefore, I think this isn't a specification problem but a cloud native problem i.e. the notion of "it doesn't matter how the artifact got here as long as its checksum matches the published checksum and it is signed".
Even in the highly automated environments existing now, the client tools do not report the endpoints they are hitting in order to fetch an artifact. So something like the docker
purl may tell you that an artifact was fetched using "docker like" ways, but not the actual endpoint. As such, the tools that generate a CycloneDX SBOM will not provide the true location. Just whatever the user has entered in their CLI like docker pull domain/repo:tag
.
Let’s tease apart a few things as I’m not suggesting this is problematic for all references. A source code repo is somewhat interesting if it can be disclosed. Most OSS project can, most products won’t What does location provide? Is it part of the identity? Or forensic information to analyze when something goes wrong? When is the SBoM generated, and can it be modified? If the SBoM is generated at the point of creation, you know the unique identity (hash or digest in oci artifact terms), but you don’t know which endpoint it may be pulled from. Small companies may distribute their artifact on docker hub, ecr, github and others. Which registry url would be used? Large companies like Microsoft build the artifacts on internal registries and distribute on mcr.microsoft.com, mcr.microsoft.cn (china) and a few air-gap clouds which I can’t even disclose the domain. If the location is part of the identity, then it doesn’t matter what the url is, and is this the best way to manage identity? If the purl is fixed at the time of SBoM creation and if the registry is the location, then what should be done in the above case where the same digest (hash) is published on multiple registries? When a consumer pulls the artifact, how do they know where to find the SBoM? If they have the SBoM, how do they find the artifact it references? When a user pulls both the artifact and the SBoM into their environment, and they can’t reach the endpoints they were originally published on, what should they do? Must they create another SBoM just to track its movement from one location to another? I realize this sounds like a chain of custody situation, and while true, and helpful for forensics, it’s not the optimal or even best or possible way for normal flows.
The beauty of digital bits is we can encode them, generate digests (hashes) of them and sign them with indecently verifiable signatures. As long as they remain the same, it doesn’t matter where they were. We know they weren’t tampered with and we know who attests to them with a signature.
This is how solar winds was “quickly” found to not be a distribution attack as the dlls were signed and they matched the digests generated from the build environment.
So, I get location is interesting from a forensics perspective. In many cases that internal, proprietary information can’t or shouldn’t be disclosed.
There is an issue with how to discover the SBoM from the point of an artifact that may not know it has an SBoM. When you have the SBoM, we need a way to know it’s referring to this very specific artifact.
@SteveLasker can you provide a concrete example of a purl that would be problematic? I cannot think of any.
Say you have a private Maven or Docker registry, and for the sake of arguments the same packages are available also in the public, default repository for this package type. For instance:
pkg:maven/org.mvel/mvel2@2.4.9.Final
from https://repo1.maven.org/maven2/org/mvel/mvel2/2.4.9.Final/pkg:docker/bitnami/redis@6.0.15-debian-10-r67
from https://hub.docker.com/layers/bitnami/redis/6.0.15-debian-10-r67/images/sha256-b65e55bbeb52644e10e05dd8396db85049eec3610a733258c27393ed1f60c431?context=explore pkg:docker/bitnami/redis@sha256:b65e55bbeb52644e10e05dd8396db85049eec3610a733258c27393ed1f60c431
tooSay that my "private" image registry is at https://quay.io/ and the package at https://quay.io/repository/bitnami/redis/manifest/sha256:b65e55bbeb52644e10e05dd8396db85049eec3610a733258c27393ed1f60c431
And my "private" Maven repo is at https://repository.jboss.org/nexus/content/repositories/ea and the package at https://repository.jboss.org/nexus/content/repositories/ea/org/mvel/mvel2/2.4.9.Final/
Based on that I could:
pkg:maven/org.mvel/mvel2@2.4.9.Final
and pkg:docker/bitnami/redis@6.0.15-debian-10-r67
or pkg:docker/bitnami/redis@sha256:b65e55bbeb52644e10e05dd8396db85049eec3610a733258c27393ed1f60c431
and configure my tools and systems to use my "private repos" above. Which I would ALWAYS need to do somehow to use my internal private repos or registries (UNLESS there would be some transparent hidden network-wide internal proxy to the same effect?)pkg:maven/org.mvel/mvel2@2.4.9.Final?repository_url=https://repository.jboss.org/
.. and pkg:docker/bitnami/redis@6.0.15-debian-10-r67?repository_url=https://quay.io/
Either way works. When things are private, feel free to handle it as you like. The fact that there is default public repository for a package type means that this default does not need to show up in the URL and can be "transparently" overridden.
So I am not sure there is any issue here?
As a recap: a purl is a URL and a is locator, and all URLs are also URIs. Therefore a purl is also an identifier. The fact there is a default location for a type as opposed to something always hardcoded in the purl string means that you can also think of a purl as a pure identifier for private purposes. The global uniqueness of this identifier is something that's handled by default by the default public package repositories of each type. If you happen to use a content identifier (say a sha256) instead, that's fine too. If you do not publish your packages on the default public repo and you do not provide a way to locate it with a qualifier, that's OK too. Rather useless as none will be able to find it, but that's OK too.
@SteveLasker now your question is in the context of #123 and the context there is that you may not have a canonical, default reference repository location for a new OCI type. I see no issue having the default location be optional for a given purl type. This will be weird and problematic as someone with just a purl will not be able to get the package; and therefore this is less useful; short of a purl type-provided default repository URL location or a repository_url
qualifier for a specific purl you will only be able to identify but not locate.
In the end, when there may be a need to get to the package code, you would always need some repo or registry location of sorts at runtime and/or fetch time to effectively retrieve the package archives. It can stay private
In recap, a package type default repository location or a repository_url qualifier is useful and desire to locate, but not essential to identify, especially if the identity is "strongly" content-defined like when you use sha256 as version. I have no problem with this. Weird but OK.
Would this be a reasonable set of rules based on OCI's requirements:
repository_url
to override the default location (for ecosystems that require a location) or to provide a hint for where the package could be located (for ecosystems that don't require a location)The intent of these rules is:
Does this resonate with people?
@stevespringett / @coderpatros I'm curious why location matters from a supply chain security perspective if you have a trusted content hash. If you can't trust the content hash, then adding location doesn't make it anymore (or less) trusted. If you trust the content hash, then adding location doesn't make it anymore (or less) trusted. Whatever trust you give to a content hash should be independent of location because it's the same content.
Extending on this, if you have sufficient provenance and pedigree information to say that a given content hash is trusted, from then on the location should be irrelevant. Inversely, if you have information that a content hash can't be trusted (or insufficient information to say you can trust it) then again the location should be irrelevant.
In the SolarWinds example, there was originally belief that a content hash was trusted, and new information came to light that a content hash shouldn't be trusted. Adding location wouldn't have mitigated or changed that outcome because it was the underlying content that became untrusted, not the location it was stored in. In fact, the IoCs provided were content hashes, independent of location.
@iamwillbar at a point in time a component that has been brought into some assembled piece of software, and where it was pulled from, may be "trusted".
But that package repository/mirror/whatever is part of your supply chain. And not everyone in the supply chain validates hashes/signatures along the way. So understanding where something came from can be useful.
Especially as the "same" component can be different, with a different hash, depending on where it was retrieved from. For example, nuget adds a signature to packages when they are uploaded. Some of those packages are also published as github release artifacts, distributed as part of an SDK, etc. Without knowing where it was retrieved from makes this situation very problematic.
Signatures don't solve the problem either. They are only good assuming the signing keys, or release process, hasn't been compromised.
@coderpatros I completely agree that repositories, mirrors, etc. are part of the supply chain, but that's independent of whether purl must include a location to establish trust. In the specific OCI case that spawned this discussion the version is a sha256 hash of the content and it can be mirrored to any number of locations and that identity doesn't change. If the content is tampered with or changed intentionally that changes the identity of the package and consumers wouldn't inadvertently retrieve the new package. Likewise, information like vulnerabilities, pedigree, etc. can be attached to the content hash and used independent of the location because the identity of the package is intrinsically linked to its contents. Unnecessarily scoping information to the location may result in relevant information being missed because it's deemed not relevant.
This isn't to say that a location can't be provided as a hint of where you might be able to retrieve the image, that's perfectly valid, but having a location doesn't change the identity or trustworthiness of a content-addressed package.
Yeah, I just don't get how removing information helps. Wouldn't you just parse the purl to extract what you want for particular use cases? Or use the component hash from the SBOM?
@coderpatros the proposal isn't to remove the concept of location but to acknowledge that for some ecosystems location does not make sense because it's not integral to the identity of the package. We're trying to define a new purl type where the concept of "location" doesn't make much sense, there is no default repository, content is often deployed to multiple repositories with no one of those being canonical, content can be moved between repositories and its identity doesn't change and it can be proven the content isn't tampered with.
For any ecosystem where two repositories could serve different content for the same identifier then location should be mandatory for the purl and I'd additionally recommend that a content hash be provided where possible. For ecosystems where the identity is intrinsically linked to the content regardless of location the location should be optional (but can be provided as a hint for retrieval but not as part of identity comparison).
I'm an outsider to purl (so please weigh this input accordingly :sweat_smile:), but in reviewing purl it doesn't seem like the OCI use case is really much (if any) different from say, hosting a Git repo at GitHub vs Bitbucket vs self-hosted -- the commit hash is going to be identical, the underlying data bits are identical, but the location is completely different (and as such, the purl reference is too).
If I may provide another use case for security not based on location (and I am not, by any means, a security expert): zero trust systems do not track location but identities like owners and maintainers. In this case, the location may change through the supply chain, but the SBOM or something else can track signatures and attestations by owners.
@tianon you're right that is a fitting example for the relationship between identity and location (and in fact was/is being discussed in #59). If we take these three (fictional examples):
pkg:github/package-url/purl-spec@244fd47e07d1004
pkg:github/package-url/purl-spec-fork@244fd47e07d1004
pkg:bitbucket/package-url/purl-spec@244fd47e07d1004
We know that this is the same commit because we know that the SHA1 hash of a Git commit is based on the commit and the state of the Git tree. I can push that same content to any number of repositories, and it is the same content. Though this isn't obvious from these examples because it requires that understanding of Git's internals and the knowledge that GitHub and BitBucket are both Git-based repositories.
If I want to know where the software is located it's important to know the github/package-url/purl-spec
, github/package-url/purl-spec-fork
, bitbucket/package-url/purl-spec portion
. If I want to describe a specific piece of software (for example, to describe a vulnerability in it, or to describe its dependencies) then the location isn't relevant and it's the fact that it's Git commit 244fd47e07d1004
(or potentially the underlying tree id) that has the vulnerabilities or dependencies that is the most important.
One way to solve this would be to consider github
and bitbucket
subclasses of a generic git
type, in this model the git
type would behave like the oci
type that's being proposed in that it would be location agnostic. The git
type would have no default location and would provide an optional repository_url
which could be used to provide a location hint.
pkg:git@244fd47e07d1004
pkg:git@244fd47e07d1004?repository_url=github.com/package-url/purl-spec
The github
and bitbucket
subclasses would behave like macros that can be expanded to a git
purl type:
pkg:github/package-url/purl-spec@244fd47e07d1004 -> pkg:git@244fd47e07d1004?repository_url=github.com/package-url/purl-spec
pkg:bitbucket/package-url/purl-spec@244fd47e07d1004 -> pkg:git@244fd47e07d1004?repository_url=bitbucket.com/package-url/purl-spec
Since these macros can be easily converted to a common base class you can compare to see if they refer to the same software but you still have the option of knowing the suggested location of the software.
I'm curious why location matters from a supply chain security perspective
@iamwillbar
Signing keys get compromised all the time. If an adversary also has control over the repo (via lateral movement) in which artifacts are published and retrieved from, location matters. It would be important to know if I retrieved an artifact from a repo that was not compromised vs one that was. In both cases, signature verification would pass. Any org relying solely on signing verification is placing entirely too much trust in the PKI and surrounding infra. They will eventually be compromised.
Embargos and other organizational or political tools that prohibit the use of technology to a given country or region.
Project risk can also be evaluated based on location. If I know a location where something was retrieved from, I may be able to determine if any contributors are associated with nation state adversaries, known threat actors, or are a major contributor from embargoed countries.
And of course forensics which would need to reconstruct the software in play, configuration, and the location where things were retrieved from.
These are just the ones I can think of. I'm sure there are others...
I'm failing to find any good arguments for decoupling location from identity.
@stevespringett we're not talking about signing or PKI at all though, we're talking about a content hash... if the content hash is in the purl (which is the proposal for oci
) the content can't be changed without the purl becoming invalid (or continuing to point at the unmodified content). So if you retrieved the content from a compromised vs known good repository you are getting the same content because the content hash is the same, so in that scenario the compromise has no impact on the artifact being retrieved. The location doesn't improve our ability to know if the package is compromised for ecosystems that are based on content hashes.
No one is recommend location being removed, just identifying that location is not fundamental to all ecosystems. Purl should reflect the realities of the ecosystems it is trying to represent, rather than trying to impose requirements on them.
On your other points, I don't know a location of 'hub.docker.com' does anything to address the threats you outline. It doesn't tell us anything about the contributors, physical location, provenance, pedigree. It may be interesting for forensics but the content hash itself verifies that the package is unchanged in comparison to the purl.
we're not talking about signing or PKI at all though, we're talking about a content hash... if the content hash is in the purl (which is the proposal for oci) the content can't be changed without the purl becoming invalid (or continuing to point at the unmodified content).
I understand that. But the ask to decouple location from identity will affect every purl type, not just oci. That's a breaking change to the spec.
No one is recommend location being removed, just identifying that location is not fundamental to all ecosystems
Agreed. And most ecosystems have a default repo, and the ones that do not clearly state they do not in the purl type definition. Golang is a good example which reads: There is no default package repository: this is implied in the namespace using the go get command conventions
. Why is this approach not good enough for oci? Why is oci so special that it needs to introduce breaking changes to all purl types? I do not understand this logic.
On your other points, I don't know a location of 'hub.docker.com' does anything to address the threats you outline.
That's a very specific example and you're likely correct, it likely will not. But we are talking about the core purl spec here, not a specific type. If you look at any package on https://packagist.org/ you can absolutely perform that type of analysis.
@stevespringett I don't think @SteveLasker is suggesting that location is removed from all purls, I think he's encouraging purl to acknowledge that there are (and will be) purl types where location is not a fundamental part of identity and should be optional. For purl types where location is required to establish identity (which is true for most purl types that exist today) it should continue to be there.
For golang
, deb
, and rpm
there's an implied location based on the namespace or distro, for generic
there's a specific location in download_url
.
If we're saying that it's OK for a purl type to have no default repository and to not require a repository_url
or other location (except as an optional hint) as long as that uniquely identifies the package then I think that's what @SteveLasker is looking for.
One way to solve this would be to consider
github
andbitbucket
subclasses of a genericgit
type, in this model thegit
type would behave like theoci
type that's being proposed in that it would be location agnostic. Thegit
type would have no default location and would provide an optionalrepository_url
which could be used to provide a location hint.
@iamwillbar I did submit a proposal for having "generic" purls in #126. Can this be a pattern that can be used for artifacts that don't follow the conventional centralized public repository pattern? This could also be a way for CycloneDX to not use such purls if they choose to.
I did also bring up the use case of zero trust security which doesn't check endpoints but identities and signatures. If a client can verify the artifact's digest and signature, is there any need to check the location?
It's not up to me. But I would advise extreme caution in supporting this for things like the git example above. Git uses SHA-1 for commits. But it is not intended for security use cases. Which is why the hash is often truncated for convenience within a particular repo and is common practice.
Expanding on the git example above that drops location information...
pkg:github/package-url/purl-spec@4860cee is not the same as pkg:github/coderpatros/this-is-not-the-purl-you-are-looking-for@4860cee
Changing that purl to something like pkg:git@4860cee
would make purl useless. I know it is a contrived example. But a more sophisticated and resourced adversary shouldn't have much trouble creating collisions for other, longer, truncated hashes.
Changing that purl to something like
pkg:git@4860cee
would make purl useless. I know it is a contrived example. But a more sophisticated and resourced adversary shouldn't have much trouble creating collisions for other, longer, truncated hashes.
We are asking if the purl-spec maintainers are willing to allow for a pattern that describes "non-centralized" locations or "moving" locations. Some examples that come to mind for me:
In the end, it is the same source code, probably coming from the same people, but just moved from one hosting mechanism to another.
Personally, I don't think relying on "common knowledge" to triangulate a location is a good security practice. As you know, locating any of these artifacts, including the ones CycloneDX is using now, also relies on user configuration which purl does not capture. Maybe trying to figure out how to accurately describe "artifact movement" is something in scope for the package-url folks?
@stevespringett, I completely agree that location is required to find the package if you need to find the package reference.
The root of this issue is:
The location is required, but it would be provided at runtime, for that particular environment:
public --> wabbit-networks-shared-internal --> alpha-team --> staging-for-public--> wabbit-networks-public-registry
\-> wabbit-networks-shared-internal --> delta-team /
public --> acme-rockets-shared-internal --> dev-team-a --> staging-for-prod-env-foo --> prod-env-foo
\-> acme-rockets-shared-internal --> dev-team-b --> staging-for-prod-env-bar --> prod-env-bar
By separating identity from location, you can use the unique identity to match the intended package when the location is provided dynamically, at runtime, for that environment. The SBoM has the identity
It's not that location isn't important, it's that it's not known when the purl is persisted.
@coderpatros completely agree on your comments re: Git, I wasn't entirely clear, my intent was more to show that the oci
content addressing vs location isn't unique.
At the time the SBoM is created, do you know the location the package will be pulled from?
In some ecosystems, yes, that information is known and exposed to build-time plugins. In most ecosystems, this information is not exposed today. It depends on the maturity of the ecosystem. As newer ecosystems become more mature, I would expect location information to become more widely avaialble.
The scope of purl is to identify and locate a software package
.
Wouldn't a urn:pkg:...
syntax be more appropriate if only the identity is wanted and location is either ignored or not applicable?
As stated in the other ticket, I would be open to the idea of a reserved word for repository_url
. Something along the lines of repository_url=unspecified
, which would override any default and tell the consumer that a repo is unknown, undisclosed, or simply not applicable.
However, decoupling location from identity, as the title of this ticket states, fundamentally changes what a purl is. Purl is useful because it includes location. I can see the need to only care about the identity part. Many SCA vendors use purl for identity only today but have the intent on advancing their capabilities to include location in the future.
@stevespringett I'm curious why the reserved word is needed, as you pointed out earlier golang (and others) don't have a default repository and don't require a repository_url (although one can still be optionally provided). Can we just codify that approach and allow each ecosystem to define whether location is required or not (again, providing very strong guidance on when it is OK to omit a location).
On your other points, I don't know a location of 'hub.docker.com' does anything to address the threats you outline.
In my experience, it's not the fact that it's on "Docker Hub" that specifically provides useful data alone about whether or not something is trustworthy, but more specifically the combination of "Docker Hub" and "specific Docker Hub user or organization which I trust" that does so (or even, maintainer of this particular image / repository within a larger organization).
Project risk can also be evaluated based on location. If I know a location where something was retrieved from, I may be able to determine if any contributors are associated with nation state adversaries, known threat actors, or are a major contributor from embargoed countries.
For example, pulling docker.io/user-i-trust/foo:bar
is going to be very different from docker.io/known-malicious-user/foo:bar
(very similarly to GitHub and any other "public" hosting site), so from the perspective of a long-time Docker user and OCI member/maintainer, I really don't see any way we can reasonably conclude that OCI is a special case here?
The only thing that really sets the OCI objects apart from these other package types (from what I can see) is that they're designed to have an explicit content-addressable digest that is commonly used to refer to and fetch them, and that digest remains unchanged (by design) when the content is moved from one registry to another. However, you cannot ask a registry for said content without also knowing the full repository path from which to fetch it (docker.io/foo/bar@sha256:deadbeef
).
Just to build on what @tianon notes above. Location is important, but it's dynamic, based on where the content is at that point, and the registry/path is/should be passed for that specific environment.
The difference between docker.io/user-i-trust/foo:bar
and docker.io/known-malicious-user/foo:bar
will be the entity that signed the content.
There will be a collection of "docker certified" content where you can trust Docker to have done some amount of verifications and vetting. No different than you trust content from Apple (or not)
This goes back to the point I made above where location is a dynamic thing.
But, I'll also cover the point about trust should not be placed in any one thing. Security is never good enough when there's only one barrier. There must be multiple elements. The digest/hash and the signature are used to assure the artifact is what it was, and who is attesting to it.
It depends on the maturity of the ecosystem. As newer ecosystems become more mature, I would expect location information to become more widely available.
This is actually the point I'm trying to interject here. I'd suggest we should be investing in establishing identity, independent from the location. Because mature ecosystems will embrace content must be promoted into environments that are secure, and from within those environments, users can't and shouldn't be capable of going back. Knowing how the content got from point a - z isn't important for verification, as it's relatively easy to circumvent. It's more important to be able to verify somethings unique identity, and owning entity, regardless of how the content got to z.
When something goes wrong, it is interesting to know when/where it got mutated. But, how does storing that information in an SBoM solve that?
Maybe an aligned package URI spec could cater for this?
Closing as #123 accounts for the location being an option, which makes purl super useful for identifying an artifact, independent from the location. Purls are persisted and location is dynamic for where the artifact may be at any point in time. Thanks everyone for the great discussion, Steve
I'm opening this issue as a question, as the readme states purl is scoped to:
While I recognize this has been a known pattern to assume a location for an artifact, this has also been a challenge for users that wish to take ownership of the content they depend upon. The realization that even common/shared/oss artifacts must be pulled from multiple locations, making an individual location a problematic concept.
A detailed post, with the context of the problem
Separating Identity From Location
TLDR:
From an SBoM community (CycloneDX and SPDX as examples), there's a desire to assure a reference within an SBoM points to a very specific artifact. It could be a container image, helm chart, wasm or other types where SBoMs are relevant. There are two dimensions to this decoupling:
For 1, you might be willing to say "this is the debian image from docker.io", however, it's currently in my private registry. As long as the image is in the same repository as the SBoM, it can be resolved, and the URL part of the identifier is ignored as the debain image is said to be unique as it was in docker.io. Mirrors could also be resolved, maybe. For 2, it's far more challenging. If the exact same debian image is pushed to docker hub, ecr public, github, mcr and quay, what would the URL be? Should the debian owner have to pick one? Whether the user pulls the debian image from hub, ecr, or their private registry, the SBoM should be able to resolve the debian image, independently from where they got the image. The proposal in #123, focuses on decoupling location from identity. Location is an optional hint in the oci-artifact purl PR. What we've been trying to understand is whether purl, the specification, can decouple identity from location, or is purl always about identity & location?
If purl is always about location, then it makes consuming public content, in a secured & reliable manner, problematic as the same content will be available from multiple locations, and users want to pull the content into their private networks.
Is it possible to amend purls scope to assure unique identity, and make location an optional parameter so it could be used reliably for SBoM and security scan result pointers?