jeremylong / DependencyCheck

OWASP dependency-check is a software composition analysis utility that detects publicly disclosed vulnerabilities in application dependencies.
https://owasp.org/www-project-dependency-check/
Apache License 2.0
6.5k stars 1.29k forks source link

Utilize NVD API instead of data feed #4732

Closed jeremylong closed 1 year ago

jeremylong commented 2 years ago

The NVD will be retiring the NVD data feeds in 2023. See changes to feeds and APIs. ODC needs to migrate to the NVD's API.

Current concerns:

  1. How will we support offline users?
  2. Users of the API will require an API key due to rate limiting.
marcelstoer commented 2 years ago

Users of the API will require an API key

Are you sure the key is mandatory? The following sentence from their announcement to me reads like it is optional.

users transmitting requests without a key will see a reduction in the number of requests they can make in a rolling 60 second window.

-> it will work without key, but slower?

koen-serneels commented 2 years ago

users transmitting requests without a key will see a reduction in the number of requests they can make in a rolling 60 second window.

-> it will work without key, but slower?

I think it will not work at all if the rate limit is hit and simply error out with an http 429. The same thing is already happening with the OSS index api. So, let me take this opportunity to address a couple of other concerns wrt using api's:

To summarize: nothing this (great!) tool can do about it, but I have the feeling that using api's will render this tool less suitable for certain types of projects than it is today

marcelstoer commented 2 years ago

@koen-serneels you perfectly summarized the concerns I myself have regarding the use of the NVD API.

nothing this (great!) tool can do about it

I was wondering if it might be somehow possible for this project to continue providing feed files (e.g. hosted here on GitHub). Could a ODC project account not subject to rate limiting regularly pull data off the NVD API and create those feed files?

aikebah commented 2 years ago

My thoughts go more along the lines of the current datastream usage: find some way to cache within the fenced environment the full historical CVE information and periodically refresh by pulling all intermediate updates from the NVD API (one of their APIs, the cves API appears at first sight to be usable to replace the cveModified stream at first glance with a more targeted retrieval of all updates that have not-yet been seen).

The main open issue I see is the bootstrapping of the cveDB using the API, as I expect that will run into the API rate-limit quite fast. The gzipped datastreams by CVE year provided for a clean solution for that (and could nicely be cached in the fenced environment for internal usage to bootstrap the local cveDBs of developers), API-based retrieval of the same datavolume feels like API abuse.

I wonder whether the NVD changes might be triggered by excessive dataload from (cloud-hosted?) build environments that dispose of cached data and retrieve the entire vulnerability dataset from scratch on builds/scans.

jeremylong commented 2 years ago

@aikebah I've often wondered how much of the load on the NVD has been caused by this project...

I wonder whether the NVD changes might be triggered by excessive dataload from (cloud-hosted?) build environments that dispose of cached data and retrieve the entire vulnerability dataset from scratch on builds/scans.

jeremylong commented 2 years ago

@aikebah see https://github.com/jeremylong/nvd-lib - I still need to add better error handling, etc. but the client for the NVD should be mostly stable.

jeremylong commented 2 years ago

I've been debating adding a "cache" mechanism to the library - so you could specify a directory and it would write the JSON to the directory with a properties file. You could then call the update again and the additional files would be added to the cache.

However, for offline use with ODC - you could create the database when online and then copy the H2 database to the internal, disconnected system. Not sure its worth building a caching mechanism.

trathborne commented 2 years ago

@jeremylong > @aikebah I've often wondered how much of the load on the NVD has been caused by this project...

Probably most of it ☺! It's definitely a good idea to implement caching, and an easy way for things like dependency-check to work from the cache. Maybe that's the easy part — hopefully the API provides the same data as the previous interface.

I haven't understood the entire process yet, mainly because it is in flux, but I'm here because every build of $employer's projects pulls all of that NVD data into fresh build environments on many Jenkins executors. I was intending to mirror it daily into an NFS volume via https://jeremylong.github.io/DependencyCheck/data/cachenvd.html but then I found that https://github.com/stevespringett/nist-data-mirror/ has been EOL'd because of this move to APIs.

drjonnicholson commented 1 year ago

Like you @trathborne - would love a caching mechanism. We've been doing a lot of work on our build processes recently which has involved making use of this tool in our CI builds on PRs and pushing the results into SonarQube to promote visibility, and as we increase adoption through our projects we now need to look at how we can implement caching as it's easily ~5mins of build time. We're considering the database approach, but that comes with some level of maintenance and infrastructure. A potential middle ground I was just considering today was whether the folder could be cached within the pipeline (e.g. using pipeline cache), but would need to identify a good file to generate a key from etc. (I welcome anyone's thoughts on this). Though there will still be initial overhead per branch.

I to some extent understand the NVD's shift toward APIs, but I share the same concerns as you all (great summary @koen-serneels).

aikebah commented 1 year ago

@trathborne

I was intending to mirror it daily into an NFS volume via https://jeremylong.github.io/DependencyCheck/data/cachenvd.html but then I found that https://github.com/stevespringett/nist-data-mirror/ has been EOL'd because of this move to APIs.

Nevertheless it might be a good idea to put the nist-data-mirror for the last 9 months before the datastreams are retired. Will both save load for the NVD, save some bandwith on your DC and allows you to speed up the downloads by lowering the in-between-download times without risking to hit NVD rate-limiting (your internal mirror would allow you to zero-out the cveWaitTime).

jeremylong commented 1 year ago

The ODC project will be moving to use the API later this year. The nist-data-mirror is EOL - but will work until the datafeeds are removed laster this year.

We may have a way to mirror the datafeeds - but only for purposes of downloading the entire dataset. I am working on the client here: https://github.com/jeremylong/vuln-tools

marcelstoer commented 1 year ago

Why is it not feasible or desirable to pull all CVEs off their API a couple of times per day (100+ requests every time) and host them in a feed-like structure here on GitHub? The ODC project could then rely on those files just like it uses the hosted suppression files here. Of course, the API-leecher script could be used by ODC users to create and host their own offline CVE source from which they feed ODC.

trathborne commented 1 year ago

I would rather pull the data from a public readonly rsync server and then run my own API server, or use a tool like the existing one which works directly with local files.

Sending every scan through a public API server is a serious single point of failure, and likely more fragile than the existing download servers. What is NIST thinking?

wilx commented 1 year ago

Is it even feasible for DependencyCheck to work reasonably fast after this change? I have large projects with thousands of dependencies that I check using the Maven plugin.

jeremylong commented 1 year ago

The API is simply replacing the data feed. Instead of downloading a bunch of static JSON files we will utilize the nvd-lib to download the JSON from an API. It will work very similar to what we have today.

remycx commented 1 year ago

Regarding the offline user approach, would it be possible to either keep the data feed capability, or have a way to customize the URL to hit another site which would provide the API data, using the same specs ?

binareio commented 1 year ago

For users/contributors interested in CVE (and more) search based on NVD API v2, welcome to check (and contribute to): https://github.com/binareio/FastCVE

Hildebrand-Ritense commented 1 year ago

For users/contributors interested in CVE (and more) search based on NVD API v2, welcome to check (and contribute to): https://github.com/binareio/FastCVE

Looks very promising. Could you maybe confirm compatibility of the ODC 'central database' solution with the DB schema created by the FastCVE tool?

binareio commented 1 year ago

The FastCVE is a docker container with Postgres DB that has the capability to load (from NVD using v2 REST API) and update the CVE/CPE data to the local DB instance. The data search queries could be run against the docker container (CLI commands) Since the solution also exposes the same search capabilities through REST APIs then there could be clients that can use the FastCVE instance as one centralized DB for searches. Just schedule to run as often as needed to update the DB (cli command): docker exec fastcve load -d cve cpe

aikebah commented 1 year ago

@Hildebrand-Ritense The two tools are separate. so there is no compatibility whatsoever. As Jeremy indicated already in this ticket ODC has the usage of the NIST API in the pipeline to be completed before the NVD datastreams are discontinued

renatoaquino commented 1 year ago

Hi, @jeremylong. 
 That hiccup with the NIST service made me aware of these NVD API changes. I understand that you're planning changes on DepencyCheck to use vuln-tools to download the database through the API. Besides the probable addition of the NVD token configuration, do you think anything more will change?

Also, do you intent to share issues that could be solved by the community on this migration?

jeremylong commented 1 year ago

In addition to using the NVD API - I am working on incorporating data from GitHub Security Advisories (will be optional - and will require a Personal Access Token). The NVD API's token will be optional as well - but it will be faster with one.

As to issues that could be solved - I'm trying to figure out what is possible once we have the data from both sources.

jeremylong commented 1 year ago

The vuln-tools project has been renamed https://github.com/jeremylong/Open-Vulnerability-Project. With the 3.0.0 release we are getting close to starting the migration.

TobiX commented 1 year ago

It seems the full feed will still be available, but as a Git repository on GitHub

This might be a recent development?!? (last 3 months)

aikebah commented 1 year ago

@TobiX That's a different dataset: the raw list of CVE records.

The NIST NVD contains CVE data enriched with more metadata such as the product coordinates of affected software (encoded as a CPE - common platform enumeration)

rcooper85 commented 1 year ago

Question: Once DependencyCheck has been updated and released to use the new v2 API, in order to allow this to work offline, would we need to host our own, offline API (as ODC will no longer be using 'feeds' when updating)? Or, will there be a configuration option we could specify that would allow us to continue using the 'feeds' to update the database (to match existing behaviour)?

jeremylong commented 1 year ago

The plan is to continue to support a data feed - which can be created using using the open-vulnerability-project/vulnz CLI.

rcooper85 commented 1 year ago

The plan is to continue to support a data feed - which can be created using using the open-vulnerability-project/vulnz CLI.

Thanks Jeremy! Will you have to release a new version of Dependency Check in order to support the new feeds? I noticed that there are no .meta files being downloaded by the new vulnz cli tool so update jobs would currently fail with current versions of Dependency Check. Will the Dependency Check database schema be updated too? I'm just trying to understand the impact the v2 API will have on my organisation who use this tool.

jeremylong commented 1 year ago

Yes - the project will move to use the API. I do not believe it will be a lot of work - but I cannot start on this until mid August due to other obligations.

ankurga commented 1 year ago

@jeremylong So, if I understand correctly we will just need to upgrade the dependency-check-maven plugin version which will be using API instead of the data feeds?

As of now, we are using it like this in our common test scripts:

mvn --batch-mode clean org.owasp:dependency-check-maven:7.4.4:check verify

BTW, thanks for the awesome work. !!

aikebah commented 1 year ago

@ankurga Strongly recommended to upgrade to the 8.x series if your build infra has internet connectivity as since 8.x there is the hosted suppression file that speeds up the process of inclusion of mitigation (suppression) for false-positives reported to this project.

ankurga commented 1 year ago

@aikebah Thanks, we have updated our scripts with 8.x version.

fullstack1981 commented 1 year ago

Dear @binareio

I've successfully set up the FastCVE project for local testing and have fully initialized its database. However, after several attempts, I consistently encounter a 404 error, as shown in the following code block:

Starting cve fetch
cve fetch            0/None |          | (00:00/?)
cve db insert        0/None |          | (00:00/?)
Request failed with status code 404
Starting cpe fetch
cpe fetch            0/None |          | (00:00/?)
cpe db insert        0/None |          | (00:00/?)
Request failed with status code 404

Is there a log that could potentially assist me with diagnosing this issue?

Secondly, I've been using the maven-dependency-check plugin, which still seamlessly integrates with nist-data-mirror in its current version. When or how can I transition to FastCVE? Below is a sample of how I've been using it with nist-data-mirror:

<!-- owasp report-->
<plugin>
    <groupId>org.owasp</groupId>
    <artifactId>dependency-check-maven</artifactId>
    <configuration>
        <outputDirectory>${env.BASEDIR}\bin\reports</outputDirectory>
        <autoUpdate>false</autoUpdate>
        <cveUrlModified>http://MYSERVER/nist-data-mirror/nvdcve-1.0-modified.json.gz</cveUrlModified>
        <cveUrlBase>http://MYSERVER/nist-data-mirror/nvdcve-1.0-%d.json.gz</cveUrlBase>
        <suppressionFile>owasp-suppressions.xml</suppressionFile>
    </configuration>
    <dependencies>
        <dependency>
            <groupId>ch.lw</groupId>
            <artifactId>owasp-suppressions</artifactId>
            <version>${project.version}</version>
        </dependency>
    </dependencies>
</plugin>

How can I configure it for FastCVE, or when will this be possible?

Kind regards, Chris

supermaurio commented 1 year ago

Just a comment - took me a short while to figure it all out: NVD has published a date to switch off the classic feed: 2023-12-15 The vulnz CLI (https://github.com/jeremylong/Open-Vulnerability-Project) can use the new NVD API to create a classic feed cache. But it's not complete yet (e.g. the file nvdcve-modified.meta is missing, and the format of the created JSON files differs from the ones in depcheck-dir/data/nvdcache).

@jeremylong thanks for the effort you put into this great tool

ArjenKorevaar commented 1 year ago

The meta files are easily created but the format of the JSON files is indeed quite different to that what Dependency Check expects.

Any updates on the support for NVD API v2?

rcooper85 commented 1 year ago

Yes - the project will move to use the API. I do not believe it will be a lot of work - but I cannot start on this until mid August due to other obligations.

Hi Jeremy, did you manage to start work on this last month? Any ideas when it will be ready by? Many thanks.

jeremylong commented 1 year ago

Yes, the work has been started - but it is a fairly large change and requires a lot of testing.

dominikdesmit commented 1 year ago

Hi @jeremylong,

Thanks for all the hard work! What can we do in order to help you to speed up the process? :)

jeremylong commented 1 year ago

Sponsorship would help: https://github.com/sponsors/jeremylong ;)

Initial implementation: https://github.com/jeremylong/DependencyCheck/compare/scratch/nvdapi

Yet to be completed:

s3nl commented 1 year ago

Hi there Jeremy. First of all, thank you for all the work you're putting in to get this ready. Is there any progress on your last progress report last month? Do you think it will be ready in time for offline use?

jeremylong commented 1 year ago

Yes, I should be able to have this completed sometime in November. I do have several other commitments - but I should be able to get this completed.

MysticalMount commented 1 day ago

Hi - I have found a maintained Github provided mirror of all NVD CVE's (in presumably API v2 format), can OWasp support this newer format or does it still require the legacy data feed formats for offline usage? https://github.com/vulsio/vuls-data-raw-nvd-api-cve