[feature] What's the next actionable critical dependency?

lumjjb commented 11 months ago

Is your feature request related to a problem? Please describe.

With the new experimental REST interfaces being proposed, and based on some discussion around being able to get value out of GUAC while waiting for ingestion to complete (due to the lack of data from users, or that the process of ingestion of large amounts of data taking a while), we want to enable some use cases which will provide value to users that will provide instant value upon setup of GUAC and iteratively get better when more and more data gets ingested.

We discussed several options including:

Do i have an SBOM, If so where does that SBOM live?
What is my most widely used dependency
Top level package “Scorecard” (for ossf scorecards, licenses, etc.)

This issue describes the second option.

Describe the solution you'd like

Rewriting the framing, expanding beyond just widely used dependencies to "What's the next actionable critical dependency?"

As SBOMs and other metadata are ingested into GUAC, we want to be able to get a good idea of what are the critical risks within and organization that we can make actionable steps on.

One example, would be figuring out which is the most widely used dependency. This could be:

Checking what the package with the highest number of IsDependency edges / or HasSBOM edges are
Having a HasMetadata node for criticality and weighing that score based on that
For those packages, rank them based on their scorecards score or vulnerability scores

The output should be a table/list with the packages to pay attention to as well as a list of problems that they have that are actionable (for example, low scorecards score, or X critical vulnerabilities). This list can then be used by a security operator or developer to triage.

Describe alternatives you've considered

Other alternatives to have issues opened up for:

Do i have an SBOM, If so where does that SBOM live? (https://github.com/guacsec/guac/issues/1483)
Top level package “Scorecard” (for ossf scorecards, licenses, etc.) (https://github.com/guacsec/guac/issues/1508)

nathannaveen commented 9 months ago

I would like to work on this

mdeicas commented 9 months ago

@nathannaveen, @pxp928, and I chatted about this. The plan is to first identify critical dependencies using hasSbom nodes and expose that as an endpoint in the Rest API, and then add other endpoints that incorporate additional metrics such scorecard score, vuln information, and metadata. Pagination in the GQL API #1525 is needed for this analysis to scale, but we can revisit when that is added.

nathannaveen commented 8 months ago

@lumjjb @mihaimaruseac @jeffmendoza @pxp928 @david-a-wheeler I had an idea on how to combine all the metrics (Number of dependents, Scorecard score, and Vulnerability scores) into a single score using a couple algorithms similar to the OpenSSF Criticality Score algorithm.

I think that this will help users understand their most critical dependencies because it will be able to combine the REST endpoints for scorecard scores, the number of dependents, and the vulnerability scores of their dependencies.

@calebbrown I would really appreciate your thoughts on this because you have spent a significant amount of time working of OpenSSF Criticality Score.

Proposal: https://docs.google.com/document/d/1Xb86MrKFQZQNq9rCQb08Dk1b5HU7nzLHkzfjBvbndeM/edit?usp=sharing.

nathannaveen commented 8 months ago

@robpike Your algorithm was fundamental in creating this proposal, so your feedback would be appreciated.

pxp928 commented 8 months ago

@SantiagoTorres also please take a look also

lumjjb commented 8 months ago

Thanks for putting this together! (also commented on doc)

From some feedback that we've run from @mdeicas 's implementation, we've noted that users want to to be able to understand individual metrics. For example, I get information knowing Fuzzing score is bad vs branch protection is bad, because there's something i can do about it!

Another note is relating to terminology, but i think its very important for us to differentiate. If we refer to criticality as the importance of a project, security metrics should not be part of criticality. There's some nuance between what's critical to my organization and what I should be investing my security efforts into. In the traditional risk metric of impact * likelihood, criticality should be the impact.

So i think there are two equations we need to define, criticality and risk. where risk = f(criticality , likelihood). I think when it comes to criticality, having a general guideline of a score is good, but most likely organizations will have an override. For example, in Google, if its related to search, its criticality is high, and any software metrics should not sway that number as much.

I do like the property where the summation is independent. Although, i do see that unless it is a snapshot, some of these variables (S_is) are dynamic, and thus it provides only as estimation that needs to be updated.

I think it would be helpful to add a list of metrics that would be considered in criticality, and as it applies to GUAC graph (assuming when we say dependencies, we are saying other applications in the GUAC graph and not the dependencies of the project itself).

It would also be helpful to have a sense of what is the range and how we can interpret this for users. What is critical or not, what is risky or not, what range is LOW, MEDIUM, HIGH?

lumjjb commented 8 months ago

I think an interesting aspect also I'd like to spend a bit time discussing is the user interaction and how the computation will be done since it is something that is tricky to scale.

nathannaveen commented 7 months ago

@lumjjb Thank you for the review!

Thanks for putting this together! (also commented on doc)

From some feedback that we've run from @mdeicas 's implementation, we've noted that users want to to be able to understand individual metrics. For example, I get information knowing Fuzzing score is bad vs branch protection is bad, because there's something i can do about it!

You're right; we should output the individual metrics as well as the combined score for each package.

Another note is relating to terminology, but i think its very important for us to differentiate. If we refer to criticality as the importance of a project, security metrics should not be part of criticality. There's some nuance between what's critical to my organization and what I should be investing my security efforts into. In the traditional risk metric of impact * likelihood, criticality should be the impact.

So i think there are two equations we need to define, criticality and risk. where risk = f(criticality , likelihood). I think when it comes to criticality, having a general guideline of a score is good, but most likely organizations will have an override. For example, in Google, if its related to search, its criticality is high, and any software metrics should not sway that number as much.

Impact and risk should be separated out and then combed to create a single risk score. The question is how we accomplish this.

I have added a couple more sections to the proposal: risk calculation and a customizable layer. These sections should address both of your points.

Note that I have updated a large portion of the proposal to include the Risk calculation, so you might have to re-read it to understand the risk calculation.

I do like the property where the summation is independent. Although, i do see that unless it is a snapshot, some of these variables (S_is) are dynamic, and thus it provides only as estimation that needs to be updated.

I agree that this is a snapshot, but how else could we create any score? We would have to ask the same question for any of the current/upcoming REST endpoints; we wouldn't be able to know the future number of dependents or the future scorecard scores.

Do you have any ideas on how we can update this feature so that it is not just a snapshot in time?

I think it would be helpful to add a list of metrics that would be considered in criticality, and as it applies to GUAC graph (assuming when we say dependencies, we are saying other applications in the GUAC graph and not the dependencies of the project itself).

When I was saying dependencies, I was thinking of the dependencies of the project along with other information from the GUAC graph.

The document refers to the number of dependents for each package a lot because I thought that metric would be beneficial for calculating criticality and was easy to explain.

It would also be helpful to have a sense of what is the range and how we can interpret this for users. What is critical or not, what is risky or not, what range is LOW, MEDIUM, HIGH?

Currently, I think that users should be able to interpret the data themselves, with 0 being the least critical/risky and 1 being the most critical/risky.

If we were to collect data later, we could add ranges such as LOW, MEDIUM, and HIGH. Currently, we don't have any data, meaning that we can't see which projects have what scores.

If we were to use the Sigmoid-based algorithm, we would also have to calculate ranges for different weights, Ks, and Ls. Similarly, if we were to use the OpenSSF Criticality Score-based algorithm, we would have to calculate for different weights.

robpike commented 7 months ago

While what's suggested here seems like a good idea, a powerful complement to that would be a tool that automatically reports the packages that need to be fixed first, a kind of frontier of updates. If X is a critical package that imports a vulnerability through Y, but Y imports it from Z, then Z is the thing that needs to be fixed (or forked and fixed). That seems straightforward at first but gets complicated fast when there are multiple constraints that must be satisfied, perhaps meaning that updating Z won't work because another package P has a constraint that requires the vulnerable version to be used, making P another critical problem. And then it's actually not code in P that's the problem, but P's constraints in the packaging system.

Moreover, fixing things like this requires cooperation of package authors that might not be reachable or amenable, or might even be hostile.

Tools to help understand and mitigate issues like these are lacking and much needed.

nathannaveen commented 7 months ago

@robpike, thank you for the review!

This would be a cool tool, as it would inform the user about the packages that need updating to fix vulnerabilities. Creating such a tool would be the next step in this idea and would elevate this scorer to a new level, as it would also provide actionable feedback.

But, I have a question regarding my understanding of the tool you are suggesting.

I will use an example to explain my understanding of your explanation.

This is our example dependency tree:

    Cv1
   /   \
  B     D
  |
  A

A depends on B
B depends on Cv1
D depends on Cv1

My understanding of your explanation is that, if, in this example, we found a vulnerability in Cv1 which is in turn causing an issue in B, A and D. But D also relies on a vulnerability in Cv1, so if we tried to replace it with the next version, Cv2, we would encounter an issue. This is because even though A and B are fixed with Cv2, changing Cv1 to Cv2 disrupts D.

If I understand you correctly, then I think that returning the packages that need to be fixed wouldn't be too hard. We could output each package with a vulnerability and which packages depend on it. We could either return the packages that immediately depend on it because those are the only ones getting their implementations updated, or we could return all the packages that depend on it.

nathannaveen commented 7 months ago

Never mind, this is pretty much the same as the patch plan feature, https://docs.guac.sh/patch-plan/.

0xAverageUser commented 2 weeks ago

Hi, from joining one of the maintainer meetings and what was written here I think there are a couple of problems / shortcomings that are intertwined here. Sorry for only coming back late to this after commiting to follow up.

Problem 0: Users need to invest significant time to setup GUAC and have to wait for all data to ingest before they learn if the tool is right for them or before they get any utility from it.

Problem 1: On the other hand, the capability to identify and quantify the "blast radius" of a risk, both in terms of known vulnerabilities, identifying dependants and the complexity, lines of code maybe required to patch, fork and fix the dependancy with an afk maintainer is missing.

There is currently a lack to automatically identify which packages to fix first in nested dependency chains. We should find root causes of vulnerabilities (e.g., Z in X→Y→Z and not Y), potentially root causes of batches of vulnerabilities from all the same package. And handle constraint systems where updating is blocked by other packages needing the vulnerable version, understand packaging constraints, and guide users potentially despite uncooperative authors to mitigate the risk by patchin a fork.

Ideally, this specific feature helps users to overcome the Problem 1, too. The learning curve and provide value, keep them engaged while Guac is not yet fully setup. The challenge is then to show them a critical issues - plus the transitive path to mitigate it - in their SBOM during data ingestion as a way to explain the benefits of GUAC.

The next step is to learn user needs and preferences to define a form to solve these two problems, through CLI, API? Or a UI? We coined the term "guac loading indicator" for whatever shape a solution can take. If the team does not disagree, I might be interested to contribute to solve this issue.

guacsec / guac

[feature] What's the next actionable critical dependency? #1505