guacsec / guac

GUAC aggregates software security metadata into a high fidelity graph database.
https://guac.sh
Apache License 2.0
1.26k stars 165 forks source link

[feature] Implement ingestion for layerID metadata #977

Open stevemenezes opened 1 year ago

stevemenezes commented 1 year ago

Is your feature request related to a problem? Please describe. At present, the layerID information is not being ingested by GUAC for both the SPDX and CDX formats. We would want this metadata ingestion to be enabled in GUAC.

Describe the solution you'd like GUAC should help parse and ingest the layerID information for the SPDX and CDX formats as starters.

Describe alternatives you've considered N/A

Additional context layerID is present in the comment section of the files enumeration and syft:location:0:layerID property of a component for SPDX and CDX files respectively.

lumjjb commented 1 year ago

Hmm, would layer IDs be okay to be represented as packages? and a DEPENDS_ON relationship matching them?

So like a layer ID would look a bit like "pkg:container_layer/sha256:abdef...", and then have an IsOccurrence to that matching Artifact. So this would resemble how we handle files.

Thoughts? @pxp928 @mihaimaruseac @mlieberman85

mihaimaruseac commented 1 year ago

I think it should be ok and a relationship INCLUDED_IN that links one to the next layer(s) that incorporate it. We can use this for ML models that start from other pretrained models too (though a different pkg:model/ pURL prefix)

pxp928 commented 1 year ago

Hmm...how would we do the new relationship INCLUDED_IN? @mihaimaruseac. Yeah, the approach makes sense for the layerID. Probably worth considering adding a specific "type" for layerIDs so that it's easier to filter on.

mihaimaruseac commented 1 year ago

Parsing the docker file or similar we can extract each layer from the base one all the way to the final container. Each layer has a digest which can identify the layer node in GUAC and from that one we can build the relationship to the next one. Will still be a verb, so we'd encode which docker container gave us the link

pxp928 commented 1 year ago

so a new verb is needed for INCLUDED_IN relationship. Not being done with existing verbs. Correct?

mihaimaruseac commented 1 year ago

Yes.

lumjjb commented 1 year ago

hmm what do you mean by the INCLUDED_IN relationship. I'm not too sure i follow. Since layers are just tarballs, i'm assuming that it would be for the container package that includes the layers and not between layers?

btw, SPDX 3.0 has gone towards the direction of including qualifiers on DEPENDS_ON relationship, not sure if we'd want to consider that as well.

mihaimaruseac commented 1 year ago

I think it's more semantics. We already DEPENDS_ON (IsDependency) to mean that a package depends on another. I was thinking of a new predicate instead of reusing the existing one to preempt having to discern between "does this edge mean that A is included/vendored in B?" and "does this edge mean that you need A in order to use B?" (at runtime/buildtime, but we don't differentiate these dependencies right now).

pxp928 commented 1 year ago

Based on discussion, we need to determine how we can represent the various types of IsDependency. One method is adding qualifiers to the IsDependency node such that it gives us greater detail about the type of dependency.

lumjjb commented 1 year ago

Here's a proposal on how to encode layerID and adjacent container image metadata

https://docs.google.com/document/d/11WqkncYYob8MtNkcvTZiYcjbvclT15UKFh6coDjJToU/edit

ridhoq commented 3 months ago

I'm interested on working on this one

ridhoq commented 1 week ago

After some discussions with @pxp928, @lumjjb, and @fengalex43, there's been a few updates to the proposal that was shared originally by Brandon.

  1. Instead of creating a new model called HasMetadataLink, the existing HasMetadata will be used for describing base image relationships. HasMetdata will have a new optional subject field that will be used to connect the base image OCI package to the container image OCI package.
  2. Instead of using HasMetadataLink to describe the relationship between a file and a layer, the existing IsDependency will be used to connect a file to a layer. A new field will be added to denote the "type" of Dependency it is - not to be confused with the existing dependencyType field. We still need to finalize on the new field name.

The high level idea behind this change is that HasMetadata should be linking models that are in different SBOMs whereas IsDependency should be linking models found within a single SBOM