Open stevemenezes opened 1 year ago
Hmm, would layer IDs be okay to be represented as packages? and a DEPENDS_ON relationship matching them?
So like a layer ID would look a bit like "pkg:container_layer/sha256:abdef...", and then have an IsOccurrence
to that matching Artifact. So this would resemble how we handle files.
Thoughts? @pxp928 @mihaimaruseac @mlieberman85
I think it should be ok and a relationship INCLUDED_IN that links one to the next layer(s) that incorporate it. We can use this for ML models that start from other pretrained models too (though a different pkg:model/
pURL prefix)
Hmm...how would we do the new relationship INCLUDED_IN
? @mihaimaruseac. Yeah, the approach makes sense for the layerID. Probably worth considering adding a specific "type" for layerIDs so that it's easier to filter on.
Parsing the docker file or similar we can extract each layer from the base one all the way to the final container. Each layer has a digest which can identify the layer node in GUAC and from that one we can build the relationship to the next one. Will still be a verb, so we'd encode which docker container gave us the link
so a new verb is needed for INCLUDED_IN
relationship. Not being done with existing verbs. Correct?
Yes.
hmm what do you mean by the INCLUDED_IN
relationship. I'm not too sure i follow. Since layers are just tarballs, i'm assuming that it would be for the container package that includes the layers and not between layers?
btw, SPDX 3.0 has gone towards the direction of including qualifiers on DEPENDS_ON relationship, not sure if we'd want to consider that as well.
I think it's more semantics. We already DEPENDS_ON (IsDependency
) to mean that a package depends on another. I was thinking of a new predicate instead of reusing the existing one to preempt having to discern between "does this edge mean that A is included/vendored in B?" and "does this edge mean that you need A in order to use B?" (at runtime/buildtime, but we don't differentiate these dependencies right now).
Based on discussion, we need to determine how we can represent the various types of IsDependency
. One method is adding qualifiers to the IsDependency
node such that it gives us greater detail about the type of dependency.
Here's a proposal on how to encode layerID and adjacent container image metadata
https://docs.google.com/document/d/11WqkncYYob8MtNkcvTZiYcjbvclT15UKFh6coDjJToU/edit
I'm interested on working on this one
After some discussions with @pxp928, @lumjjb, and @fengalex43, there's been a few updates to the proposal that was shared originally by Brandon.
HasMetadataLink
, the existing HasMetadata
will be used for describing base image relationships. HasMetdata
will have a new optional subject
field that will be used to connect the base image OCI package to the container image OCI package.HasMetadataLink
to describe the relationship between a file and a layer, the existing IsDependency
will be used to connect a file to a layer. A new field will be added to denote the "type" of Dependency it is - not to be confused with the existing dependencyType
field. We still need to finalize on the new field name.The high level idea behind this change is that HasMetadata
should be linking models that are in different SBOMs whereas IsDependency
should be linking models found within a single SBOM
Is your feature request related to a problem? Please describe. At present, the layerID information is not being ingested by GUAC for both the SPDX and CDX formats. We would want this metadata ingestion to be enabled in GUAC.
Describe the solution you'd like GUAC should help parse and ingest the layerID information for the SPDX and CDX formats as starters.
Describe alternatives you've considered N/A
Additional context layerID is present in the comment section of the files enumeration and
syft:location:0:layerID
property of a component for SPDX and CDX files respectively.