oliverchang commented 1 year ago

Tracking issue for building a guided remediation feature as part of OSV-Scanner.

Some ideas:

Suggesting direct dependency updates to remediate transitive vulns.
Ways to prioritize vulnerabilities based on things like dependency depth, severity, whether if it's dev-only etc.
Minimal re-locks to avoid known vulnerabilities in dependencies.
Automating upgrades with unit tests in a feedback loop.
Graph visualisations.

Current roadmap:

Q1 2024 for release of feature for npm.

Check out https://github.com/google/osv-scanner/issues/352#issuecomment-1820008675 for a walkthrough of what we've been building.

abhisek commented 1 year ago

@oliverchang This will be a VERY useful feature. I have some thoughts here:

Suggest minimal number of direct dependencies to upgrade to specific version to remediate maximum risk.

There is ofcourse a lot of detail to be considered here like how to measure aggregated risk. But it will be very useful in driving remediation, which many a times is not feasible due to too many findings or version conflicts in transitive dependencies when one or more direct dependency is updated.

Also another relatively simpler but valuable feature would be to have the ability to identify direct dependency upgrade to specific version to remediate vulnerabilities in transitive dependencies. I think you already suggested that. I am not sure if it is possible to build a graph by parsing flat lockfiles like gradle.lockfile or requirements.txt

In fact, just by having a dependency graph representation instead of a list will enable a lot of mitigation related analysis capability. I am not sure if its already done, but it may be feasible to build a usable graph based on data from deps.dev API.

Is there any plan to combine OSV and deps.dev data in future to perform both vulnerability detection and effective remediation?

oliverchang commented 1 year ago

Thanks for the feedback @abhisek ! We are definitely looking at ways to make the mechanisms flexible enough for users to decide on what they want to prioritise. Please stay tuned here for once we have some things ready to try.

We are working closely with the deps.dev team here.

agmond commented 1 year ago

Hi @oliverchang, Will the remediation feature include a command that automatically updates the manifest file and the lock file (like npm audit fix), or will it only have the guidelines for a manual upgrade?

oliverchang commented 1 year ago

@agmond yes: this tool will give you the changed manifest and lockfile at the end of of the guided remediation.

agmond commented 1 year ago

Hi @oliverchang, I have another question here. Will I be able to remediate only a single vulnerability at a time? For example, let's say I have several issues in my dependencies file, but I want to fix them one by one (and not all of them at once). Will this be possible?

abhisek commented 1 year ago

@oliverchang Just checking if someone is already working on this. I need the ability to build dependency graph (instead of list) by parsing lockfiles. I am willing to work on this feature and contribute to osv-scanner but I am guessing its not a trivial change so would like some input on possible approach and challenges that the team foresee.

My approach for this would be in following iterations:

Refactor PackageDetailParser to support graph APIs
Use deps.dev API to resolve dependencies for a package and add relationship to the graph
Use code analysis to identify direct dependencies

What do you think?

oliverchang commented 1 year ago

Hi @oliverchang, I have another question here. Will I be able to remediate only a single vulnerability at a time? For example, let's say I have several issues in my dependencies file, but I want to fix them one by one (and not all of them at once). Will this be possible?

Yes, this tool will be completely configurable.

@abhisek: which ecosystems are you interested in? I suspect the amount of effort would vary largely depending on the ecosystem.

@michaelkedar is already working on this for npm. I believe it's possible to recreate the graph from the existing package-lock.json without any additional API calls -- @michaelkedar can you please confirm?

abhisek commented 1 year ago

@oliverchang Thanks for the info. I am primarily looking at the maven ecosystem, particularly pom.xml and gradle.lockfile lockfiles. I am also interested in the PyPI ecosystem, particularly requirements.txt lockfile. I think these are mostly a flat (list oriented) data structure and resolving them into a graph would need some data source for relationship between the packages.

I am yet to explore if there are package manager specific options / plugins that dump the dependency graph. I think it would be possible with Gradle / Maven but at the cost of having the scanner depend on these package managers at runtime which is probably not desirable.

oliverchang commented 1 year ago

@abhisek we'd welcome contributions for Maven and PyPI here, given that we're focusing on npm at the moment.

That said, do you have any details on how you would leverage deps.dev to resolve graphs from a non-lockfile? There may be a fair bit of complexity involved in implementing the ecosystem-specific resolution algorithms.

abhisek commented 1 year ago

@oliverchang I need to do a POC to confirm, but here is a tentative approach for building a dependency graph from requirements.txt based on dependency relationship data from deps.dev

For each package in requirements.txt
Use deps.dev API to fetch dependencies for the package version
Add relations to other packages in requirements.txt as dependencies based on data from deps.dev
Any package (node) without an incoming edge is considered a direct dependency till we have code scanning capability (may be future iteration) to accurately identify direct dependencies

Here the assumption is, requirements.txt contains a list of all dependencies, including transitive dependencies. While this need not be true given the requirements.txt spec, I don't think we can handle the cases where all transitive dependencies are not included in requirements.txt because there is no way for us to identify the version of the dependency package which is resolved at runtime by the package manager based on different constraints, including the latest available package from the registry.

sarnesjo commented 1 year ago

Hi @abhisek, I work on the deps.dev team. The problem of reconstructing a dependency graph is indeed quite tricky for some ecosystems, and unfortunately the approach you propose won't work in the general case.

If A depends on B and C (and B and C have some other dependencies), then:

resolving the dependencies of A
separately resolving the dependencies of B and C, and then combining them

... won't, in general, produce the same result. The reason for this is that dependency resolvers have rules for how to pick versions of packages that show up multiple times in a dependency graph. (Different dependency resolvers have different rules.)

What you can do with the deps.dev data is, if the package version you're inspecting is in our corpus (i.e. it's published to pypi.org in one of the packaging formats we understand) you can use the GetDependencies API endpoint to look up a dependency graph for it. Note that this graph may still be different from ones you see elsewhere, as dependency resolution depends on many environmental factors such as pip version, python version, OS and architecture, time, etc.

oliverchang commented 1 year ago

For folks following, here's a preview of what we've been working on. We're hoping to release this Q1 next year for npm, with more ecosystems to come later.

Goal

The goal of guided remediation is to help developers who are flooded with vulnerability reports in their project dependencies. These projects often do not keep up to date with their dependencies, leading to a lot of toil when they need to upgrade them to fix known vulnerabilities. This is hard because upgrades often cause breakages, and they can’t be blindly applied all at the same time.

We're working on a tool as part of OSV-Scanner (leveraging https://deps.dev/) to help with prioritizing upgrades based on both impact as well as return on investment. This tool also enables fully automated remediation workflows.

Walkthrough of modes

Interactive mode

Let's jump right into what this looks like. There are two modes that our tool works in, an interactive mode, as well as a scriptable automatic mode.

This is our interactive mode. Here we have a popular but unmaintained JavaScript project we found on GitHub called keystone-classic. When we scan it, we find that there are a whopping 169 vulnerabilities. We provide information which ones affect direct dependencies (31), which ones are transitive (138), and which ones are dev-only dependencies (55).

We also provide a number of prioritization mechanisms to help users focus on the vulnerabilities that matter. One heuristic we have as a measure of exploitability is dependency depth. If you have a vulnerability in your dependency tree that's 10 layers deep, it's most likely less exploitable than a vulnerability in a direct dependency. We also let you set thresholds on severity, as well as whether or not you care about dev dependencies. As an example, let's set the maximum dependency depth to 4, the min severity to 6, and ignore dev dependencies.

When we apply these criteria, we instantly bring down the number of vulnerabilities from 169, to only 75 vulnerabilities.

There are two broad categories of actions a user can do to resolve these.

In-place lockfile modification

One is something we call "in-place" lockfile modification. This is where we patch the dependency graph in-place to replace vulnerable versions with non-vulnerable versions, while still respecting all the version constraints in the graph. This is often the least risky approach, but also the approach that fixes the fewest vulnerabilities. On the right, we show how many vulnerabilities every individual upgrade fixes. For instance, upgrading ua-parser-js from 0.7.19 to 0.7.36 fixes 4 vulnerabilities. This list is ordered by the number of vulnerabilities fixed by each upgrade, and applying all of them will fix 28 vulnerabilities.

We can also show you the dependency graph of the vulnerable dependencies.

The dependencies highlighted in blue are direct dependencies. Here, for ua-parser-js, we see that there are at least 4 different paths leading up to it, and the dependency depth (shortest path) is 3.

Now let's look at the actual in-place upgrade. In this screen we can choose which of the in-place upgrades you want to apply. It's almost like a shopping cart of upgrades that you can select. Once you select the ones you want, you can write the results out to the project’s package-lock.json, run tests, CI/CD and see which ones don’t break.

Relock and direct dependency bumps

The other strategy to fixing vulnerabilities is relocking and direct dependency bumps. Relocking recomputes your entire dependency graph, taking the newest possible versions of all packages in your graph, while still respecting your graph version constraints. This causes a larger number of changes to your graph, which potentially carries a larger risk of breakages.

When we relock, we fix 48 vulnerabilities instantly, and are left with 27 vulnerabilities. Of these, 11 are actually impossible to resolve, because they are in transitive dependencies where there is a lack of fix paths for them. That is, one or more intermediate dependencies’ version constraints force the vulnerable packages to be in your graph. There are no possible dependency upgrades that would get rid of any of these 11 vulnerabilities. For these, the only ways to remediate them would be to mitigate the vulnerabilities some other way, or investigate if they are false positives and create a VEX statement.

For the 16 that we can fix, our tool provides direct dependency upgrade options, ordered by the number of vulnerabilities resolved. These correspond to changes to the users’ package.json to change the versions of their direct dependencies. These are often major version upgrades that carry a bit of risk, so users can interactively try applying these to see what works for them.

Automatic mode / CLI usage

So that was the interactive mode, which enables users to understand their vulnerabilities and prioritise them. We also offer all of the functionality we just saw through our command line flags in a way that enables this to be scripted.

For instance, we can set the maximum dependency depth via --max-depth, the minimum severity via --min-severity, whether we want to relock via --relock, and more.

We also wrote a PoC script to show how we can automate the process of determining non-breaking dependency upgrades to achieve the best possible upgrade result with zero human interaction, by using unit tests in a feedback loop.

Guided remediation demo

Our script continuously tries to perform each suggested upgrade in progressively riskier ways, and runs tests to see if any breakages are caused. If they do, bad upgrades are added to a blocklist. The upgrades that do work are combined to produce the optimal set of available upgrades that don’t cause breakages.

By running this script against keystone-classic on all 169 vulnerabilities without any filtering, we were able to fully automatically find that relocking plus upgrading 4 of the 6 possible direct dependencies resulted in zero breakages for the project. This results in 114 vulnerabilities fixed. Of the remaining 55, 44 are not possible to be fixed due to lack of fix paths. The remaining 11 fixable vulnerabilities will require the bad packages mongoose and marked to be upgraded, which likely have breaking changes that require human involvement.

Thanks for reading this far :) let us know if anybody has any feedback on this.

seelder commented 10 months ago

@oliverchang Will OSV-scanner still rely on lockfiles for now? Or will it pull data from deps.dev, or have some other way of pulling other dependency information, e.g. based on package.json? (Sorry to randomly ask a question - I'm a PhD student interested in OSV-scanner as part of a research project)

oliverchang commented 10 months ago

@oliverchang Will OSV-scanner still rely on lockfiles for now? Or will it pull data from deps.dev, or have some other way of pulling other dependency information, e.g. based on package.json? (Sorry to randomly ask a question - I'm a PhD student interested in OSV-scanner as part of a research project)

Guided remediation will have a mode where we resolve manifests into full transitive graphs (leveraging deps.dev).

(And not at all! Very glad to see research interest in this project).

abhisek commented 10 months ago

@sarnesjo @oliverchang May not be relevant anymore, but closing the loop on what we discussed in https://github.com/google/osv-scanner/issues/352#issuecomment-1681700939

I did some work on reconstructing dependency graph from gradle.lockfile and data from deps.dev. The approach I used was

Consider gradle.lockfile as the source of truth for all nodes (package version) in the graph
Use deps.dev to identify dependencies for a given node (package version)
Find target nodes already existing the graph from step [1] and add relations based on data discovered from [2]

This may not be very reliable and may not entirely match the dependency graph actually generated by gradle but it has the necessary information to suggest remediation for a top level (found in build.gradle) dependency instead of recommending the user to update a transitive dependency, which a user can't really do in normal workflow.

oliverchang commented 8 months ago

This is now released in https://github.com/google/osv-scanner/releases/tag/v1.7.0 for npm!

google / osv-scanner

Automated/guided remediation #352

Goal

Walkthrough of modes

Interactive mode

In-place lockfile modification

Relock and direct dependency bumps

Automatic mode / CLI usage