Open kurtseifried opened 1 year ago
Do we have examples of users who are producing EPSS today?
I'd like to add EPSS (https://www.first.org/epss/) to the severity field, which is a form of severity (how likely is it going to be exploited). [...] This change is simple and I've submitted a PR.
A step further, I would suggest, like first.org does, to also compute and use the localized percentile, for the subset of vulnerabilities considered. Quoting https://www.first.org/epss/articles/prob_percentile_bins:
Another consideration when working with percentiles is that they are based on every published CVE, and it is unlikely that any organization is dealing with every CVE. Therefore, percentile values may change for a given subset of vulnerabilities. For example, when a user considers only those vulnerabilities relevant to her network environment, the percentile values will change -- because the sample of total vulnerabilities will change. The EPSS probability will not change, but the relative position (ranking) of one vulnerability to another will very likely change.
This is more complicated, since it requires grabbing the gloabl EPPS/percentile first, and then from the locally relevant EPSS, recompute local percentiles per project / use case. Maybe this does not have its place in the schema since it would be computed by a tool, but in the end this information should have its place in an OSV document.
Yes FIRST is currently producing data:
https://www.first.org/epss/data_stats
which I would like to include in the machine readable data provided by GSD. We're also looking at EPSS for non CVE data.
We're also looking at EPSS for non CVE data.
I am very interesting to see how it goes. Unless I misunderstood the basis of EPSS, and/or there has been a fundamental change in the version 3 of the model recently released (with not much details about the changes):
Thus EPSS for non CVE data would be both an exciting and unexpected development! Maybe you have any insights and/or shareable info?
One comment: Chicken and Egg. Why do people use CVE? It exists. Why don't they use X? Apart from GSD/OSV efforts there isn't another source. I suspect once we support EPSS and have it in all the CVE data for example, people may begin to ask a) can you do this for other public data (like GSD) and b) can we do it, e.g. open up the model and.or c) let's make an open model and tweak it and see if we can do this...
So a good step forwards would be having EPSS available and machine readable in OSV.
Related, as I work on adding the CVE CVSS data from records I'm converting to OSV for https://github.com/google/osv.dev/issues/783, I've wondered how many native OSV records are including this (I haven't done any research, just mentally flagged that I'd like to)
epss : the EPSS score representing the probability [0-1] of exploitation in the wild in the next 30 days (following score publication)
would this not mean it changes every day? or if not, is there somewhere reporting how often does it usually change?
/cc @jayjacobs
(one of the creators and co-chair of EPSS SIG here)
Did I miss anything? Happy to answer any other questions and apologies for the wall of text.
Quick edit: One of the concerns about EPSS is that it is volunteer driven and at least in theory could disappear. I am working on some EPSS things on the backend and some changes are coming, but I am 100% committed to keeping EPSS exactly what it is currently - EPSS scores are freely available and open for commercial use. I expect it will only be getting better and more reliable (and hopefully funded) over the next few years.
@jayjacobs thanks, the wall of text seemed like a pretty nice summary to me...
So is there any value in OSV records having an EPSS severity type?
It seems to me that in order to have an EPSS score for an OSV record, first EPSS would have to support calculating them for OSV records?
I could see a scenario where OSV.dev could, for CVEs converted from the NVD and vulnerability records otherwise aliasing a CVE ID, incorporate EPSS data as published?
I think the value of adding EPSS into the OSV record is that removes a second step for the consumer. They wouldn't have to go look it up on their own by hitting the EPSS API or downloading the CSV. it's like adding CISA KEV information for the convenience IMO.
EPSS automatically scores all of the CVEs every day and it only works for vulnerabilities with CVEs since that's how disparate data sources are aggregated (currently). It isn't set up to calculate them on non-CVE data since it's nearly impossible to correlate other data sources for non-CVE data, and it's run off of a rather specific set of features that would be difficult to accurately duplicate outside of the existing data collection efforts.
If you wanted to incorporate EPSS scores, I would suggest the following generic set of steps:
I think that'd be it. like I said in the previous post, most scores do not change day to day, so you could add some logic not to modify the score if it didn't change.
Hope that's helpful.
We had a discussion with some GitHub folks (@darakian @taladrane) earlier, and we came to the conclusion that it may not make sense to include EPSS in OSV.
This is because EPSS is keyed on CVE and produced by a single entity, while OSV takes a more federated approach and enables database owners to publish their own interpretation of vulnerabilities (which may or may not link back to a CVE). There are also going to be overlap between different databases for the same CVE (e.g. a Linux distro DB vs a language package DB), which adds to potential confusion here and potential mismatching EPSS scores between different sources.
A more minor point is that it may introduce a bit of churn for OSV records (with the modified
date changing frequently).
It seems like the best way for users to consume EPSS while using OSV is to lookup the relevant aliased CVE against the source of truth (https://www.first.org/epss/data_stats)? This does still have that second step for consumers, but I wonder if we can make it easier via an aggregator like https://osv.dev somehow.
@jayjacobs do you have any record churn statistics you can share?
I guess we can decouple support in the schema for the severity type from OSV.dev doing anything with EPSS (either at import time or aggregation time) and that would still allow downstream consumers to merge the values themselves if they wanted to, @oliverchang ?
I'd like to add EPSS (https://www.first.org/epss/) to the severity field, which is a form of severity (how likely is it going to be exploited).
One wrinkle: EPSS scores include:
epss : the EPSS score representing the probability [0-1] of exploitation in the wild in the next 30 days (following score publication) percentile : the percentile of the current score, the proportion of all scored vulnerabilities with the same or a lower EPSS score
The EPSS percentile should be included, and I think the percentile should be included, e.g. like an Olympic score if everything is 9.x then 9.9 and 9.8 are vastly different. So the format would be:
type: EPSS (it doesn't have a version currently AFAIK but it might in future, so no version specified currently) score field: EPSS/0.00043/0.06996
so the EPSS score and the percentile of where that specific result currently lays
This change is simple and I've submitted a PR.