guacsec / guac

GUAC aggregates software security metadata into a high fidelity graph database.
https://guac.sh
Apache License 2.0
1.26k stars 168 forks source link

[feature] Questions regarding adding REST Endpoints for Vulnerability and Legal info in an SBOM #2058

Open nathannaveen opened 1 month ago

nathannaveen commented 1 month ago

When talking about GUAC, a common issue that pops up is that it's hard to find your vulnerabilities or legal information in a specific SBOM. Currently, to find this, the user would have to make multiple different graphql calls which is cumbersome, to solve this we could use a couple REST endpoints.

Since this comes up often, I would like to start working on it. But, before I do, I'd like to get some feedback from the GUAC community on a few of my questions.

This feature is probably going to be made up of three REST endpoints.

  1. The first endpoint would be to find the latest SBOM for a specific package or artifact.
  2. The second endpoint would be to find the vulnerabilities in the latest SBOM for a package or artifact.
  3. The third endpoint would be to find the licenses in the latest SBOM for a package or artifact.

Here are some of my questions regarding this feature:

  1. When searching for the latest SBOM for a given package or artifact, do we want to search for the package or artifact via id or the spec?
  2. To find the latest SBOM, I was thinking of comparing the SBOMs via their version. If both of them don't have a version, we compare them via their timestamp. However, since we can search via artifact as well as package, we need to modify this thought process because artifacts don't have versions. Therefore, we would have to compare all artifacts using their timestamps. Does this approach make sense, or is there a better way to handle this?
  3. When querying for vulnerabilities in the latest SBOM, should we include transitive dependencies or only direct dependencies? Or should we make it a flag to search all dependencies? Also note that with a big enough use case searching through transitive dependencies for vulnerabilities could possibly take a very long time.
funnelfiasco commented 1 month ago

What's the use case for searching a single SBOM versus the entire graph? One of the nice things about GUAC is the fact that it operates across multiple SBOMs at once, so I'm not sure why we'd want to look at a single SBOM.

When querying for vulnerabilities in the latest SBOM, should we include transitive dependencies or only direct dependencies? Or should we make it a flag to search all dependencies? Also note that with a big enough use case searching through transitive dependencies for vulnerabilities could possibly take a very long time.

I think being able to choose between the two is good. The more indirect the dependency, the harder it will be to remediate, but people will still want to know about it. The performance issue is a concern, though, so having a way to specify a max depth (or a simple direct-dependencies-only) would be good for those larger instances.

jeffmendoza commented 1 month ago

What's the use case for searching a single SBOM versus the entire graph?

The use case is "I want to know the current (vulns/licenses) of the dependencies in my software".

current: Your software likely builds SBOMs and ingests into GUAC automatically. In this circumstance you will have multiple SBOMs in GUAC. Maybe every build, or just every release. Hopefully you are updating your dependencies over time from older vulnerable ones to newer ones without vulns. For this reason, you want to analyze your latest sbom, not old ones.

my: You can put SBOMs for any software into your GUAC instance, including those delivered by a third party. Also, using the deps.dev collector, you get additional "sboms" for any open source dependencies you use. Therefore, we think folks will want to analyze a specific piece of software, and not the entire graph.

jeffmendoza commented 1 month ago

When searching for the latest SBOM for a given package or artifact, do we want to search for the package or artifact via id or the spec?

IMO there should not be any concept of "spec" in the REST api. We don't want to reuse any GraphQL tree-based concepts here in the pacakge/source trees.

I think the decision here is between"guac-id" and "purl":

However, since we can search via artifact as well as package...

In this case if one or both sboms are attached to artifact, I would always use the "IsOccurrance" node to find the package version and compare the versions there.

should we include transitive dependencies

I would only include the packages under "IncludedSoftware" attached to the "HasSBOM" node. Any other dependency relationships found in the GUAC graph are likely due to an alternative dependency resolution graph that does not reflect the current/latest build. As we know dependency resolution is sensitive to time and other factors and can result in different graphs based on those circumstances.

funnelfiasco commented 1 month ago

The use case is "I want to know the current (vulns/licenses) of the dependencies in my software".

I see what you're getting at, but a single SBOM is not necessarily the answer. What happens if my software is 2 SBOMs of unrelated applications? Or three? Or 10? In that case, a label of some kind to indicate which SBOMs are "mine" might be better.

That's more work, of course, so the single SBOM query is definitely an improvement over nothing, but I don't think it fully solves that use case.

nathannaveen commented 1 month ago

@funnelfiasco good idea, specifying a max depth along with a toggle (Or just setting max depth to 1) we can search direct dependencies as well as transitive dependencies.

@jeffmendoza you are right that we shouldn't be using the spec. When first thinking about this I didn't want to use the purl because of the fact that it isn't that precise. Even though you don't need to query GraphQL for the purl, I think the id would be a better option. Additionally, I think that accepting both package version and package name ids shouldn't be a problem. Thank you for pointing out that that artifact would be attached to an isOccurrance, I totally forgot about that! And, yes I was thinking of only searching includedSoftware because we only want to search in that single SBOM.

Thank you for helping answer my questions!

mdeicas commented 1 month ago

Some thoughts at a higher level. I know we won't be able to follow this guidance in every situation.

  1. We shouldn't expose ontology concepts in the REST API -- the implementation of the endpoints should handle this in an intuitive way for the user. For example, the endpoint that searches for an SBOM by a package identifier should itself look both look for HasSbom nodes attached to the package trie and HasSbom nodes attached to artifact occurrences of that package. The consumer shouldn't have to consider that manually. If it does need to be exposed, then it would probably be better handled as a query configuration instead of a dedicated endpoint.
  2. Use external identifiers such as digests and purls to identify things in Guac. I imagine that this will be possible for nouns but not as much for verbs.
  3. As we discussed in the community meeting yesterday, most endpoints should operate on precise identifiers. A few other endpoints can be added to help find those precise identifiers if a client does not know that in advance.

Specifically on the proposed endpoints above, I also wonder if the SBOM is the right primitive, for a few reasons.

EDIT - just an idea, but another way to implement this may be to use the transitive dependencies endpoint, to handle the case where a package or artifact listed in the the top level SBOM also has an SBOM itself.

nathannaveen commented 1 month ago

@mdeicas thanks for your feedback! I have been thinking about this, and have a couple ideas on how to go about it.

  1. We shouldn't expose ontology concepts in the REST API -- the implementation of the endpoints should handle this in an intuitive way for the user. For example, the endpoint that searches for an SBOM by a package identifier should itself look both look for HasSbom nodes attached to the package trie and HasSbom nodes attached to artifact occurrences of that package. The consumer shouldn't have to consider that manually. If it does need to be exposed, then it would probably be better handled as a query configuration instead of a dedicated endpoint.
  2. Use external identifiers such as digests and purls to identify things in Guac. I imagine that this will be possible for nouns but not as much for verbs.
  3. As we discussed in the community meeting yesterday, most endpoints should operate on precise identifiers. A few other endpoints can be added to help find those precise identifiers if a client does not know that in advance.

I agree that a purl should be the identifier, and I think that the best way to implement this is to do something like: https://github.com/guacsec/guac/issues/1734. I have been working on adding a purl endpoint similar to this, and I think it will allows users to search via purl in a generic manor and then add flags to search for vulns or license.

Specifically on the proposed endpoints above, I also wonder if the SBOM is the right primitive, for a few reasons.

  • A more direct question is "what licenses does this package / artifact have", instead of an intermediate hop through an SBOM.
  • ~There are other sources for licenses and vulns than just SBOMs (e.g. ClearlyDefined, OSV), so these endpoints wouldn't be complete in that sense.~ (EDIT, I realize this doesn't make much sense -- I was thinking of a CertifyVuln or a CertifyLegal attached to the top level artifact or package, but that probably isn't a common case)
  • I'm not sure how common it is in practice for artifacts or packages (with versions) to have multiple SBOMs. In the case they do, it's not clear to me that taking the latest SBOM is more "correct" than considering all of them. If one of my artifacts had multiple SBOMs, perhaps because I ran multiple tools on it, I would would want to consider the vulns and licenses from all of them. The concept of "latest" becomes relevant when talking about versions, to track changes over time.

EDIT - just an idea, but another way to implement this may be to use the transitive dependencies endpoint, to handle the case where a package or artifact listed in the the top level SBOM also has an SBOM itself.

If we were to search via purl in a method similar to https://github.com/guacsec/guac/issues/1734 we would be able to search rest via something like: v1/purl/pkg:{type}/{namespace}/{name}{@optional version}?vulnerabilities=true&latestSbom=true. This would allow us to specify license or vulnerabilities as the primitive, not sboms.

mdeicas commented 3 weeks ago

Sounds good to me! Passing purls as path parameters isn't the most readable but it does better indicate that it is a required parameter. For reference it's also what deps.dev does (e.g. https://api.deps.dev/v3alpha/systems/go/packages/github.com%2Fgoogle%2Fwire/versions/v0.5.0).