google / osv.dev

Open source vulnerability DB and triage service.
https://osv.dev
Apache License 2.0
1.49k stars 182 forks source link

Ingest https://github.com/psf/advisory-database/tree/main/advisories #1552

Closed oliverchang closed 1 year ago

sethmlarson commented 1 year ago

@oliverchang Would be happy to help out implementing this. I saw in the contributing guide it mentioned creating an issue but not much else, is there anything I can help with?

oliverchang commented 1 year ago

Thanks for reaching out!

https://github.com/psf/advisory-database/tree/main/advisories appears to all be keyed on commits, so there's not much to do here other than for us to set up the infra config to start importing. We'll get to it this week :)

andrewpollock commented 1 year ago

https://oss-vdb-test.wl.r.appspot.com/vulnerability/PSF-2023-9 got on our radar because it's invalid to have an ECOSYSTEM range without an affected[].package field.

OSV.dev will attempt to enumerate affected[].versions for GIT ranges as part of the importing process, so there's no need to supply them.

oliverchang commented 1 year ago

https://oss-vdb-test.wl.r.appspot.com/vulnerability/PSF-2023-9 got on our radar because it's invalid to have an ECOSYSTEM range without an affected[].package field.

OSV.dev will attempt to enumerate affected[].versions for GIT ranges as part of the importing process, so there's no need to supply them.

+1 see the example linked:

image

These are dervied from git tags that match the provided git ranges.

sethmlarson commented 1 year ago

Got it, one hitch to that is it seems that historical git branches get deleted from the GitHub repository after they're no longer supported, so advisories affecting versions before 3.8.0 might not get populated?

sethmlarson commented 1 year ago

Also looks like the advisory https://oss-vdb-test.wl.r.appspot.com/vulnerability/PSF-2023-8 isn't rendering correctly, I'm assuming that I'm encoding the newlines incorrectly.

oliverchang commented 1 year ago

Got it, one hitch to that is it seems that historical git branches get deleted from the GitHub repository after they're no longer supported, so advisories affecting versions before 3.8.0 might not get populated?

Hmm, we managed to extract older versions (including 2.x) for https://oss-vdb-test.wl.r.appspot.com/vulnerability/PSF-2023-8. Perhaps that's a non-issue?

Also looks like the advisory https://oss-vdb-test.wl.r.appspot.com/vulnerability/PSF-2023-8 isn't rendering correctly, I'm assuming that I'm encoding the newlines incorrectly.

https://github.com/psf/advisory-database/blob/4d5ebbc62f10b00e10f10e165587aedfd85134da/advisories/python/PSF-2023-8.json#L10 looks to have double quoted newlines.

sethmlarson commented 1 year ago

Got it, one hitch to that is it seems that historical git branches get deleted from the GitHub repository after they're no longer supported, so advisories affecting versions before 3.8.0 might not get populated?

Hmm, we managed to extract older versions (including 2.x) for https://oss-vdb-test.wl.r.appspot.com/vulnerability/PSF-2023-8. Perhaps that's a non-issue?

That is missing a lot of tags for v2.x so it's not a complete import, the 2.7.x series went until v2.7.18.

oliverchang commented 1 year ago

Got it, one hitch to that is it seems that historical git branches get deleted from the GitHub repository after they're no longer supported, so advisories affecting versions before 3.8.0 might not get populated?

Hmm, we managed to extract older versions (including 2.x) for https://oss-vdb-test.wl.r.appspot.com/vulnerability/PSF-2023-8. Perhaps that's a non-issue?

That is missing a lot of tags for v2.x so it's not a complete import, the 2.7.x series went until v2.7.18.

Hmm, that's rather unfortunate. Without historical branches the analysis is purely on the main and available branches, which is probably why we're missing later versions in older CPython branches that branched off main. Is there any way to advocate for keeping these branches?

In any case, having ECOSYSTEM ranges is not valid when there is no defined ecosystem, and will break our import sadly.

andrewpollock commented 1 year ago

and will break our import sadly.

More to the point it currently doesn't break our import and we're winding up with vulnerabilities that don't meet certain assumptions and that causes all sorts of issues :-(

oliverchang commented 1 year ago

@sethmlarson any chance we can remove the ECOSYSTEM ranges for now to unblock our ingestion? As @andrewpollock correctly pointed out, technically it doesn't fully break our ingestion, but it does break other parts of our UI/infra in subtle ways.

Assuming that the branch heads to all the historical released versions of cpython still exist (e.g. via tags), there's likely a technical solution that will enable us to enumerate every version correctly (i.e. correctly enumerating all the way up to v2.7.18 per https://github.com/google/osv.dev/issues/1552#issuecomment-1692593964). The challenge for us is doing this efficiently, but I've filed #1590 to track this.

sethmlarson commented 1 year ago

I can remove the ecosystem version ranges, it would be good to somehow capture the correct affected versions though. Would listing out the tags in "affected[].versions" work for that purpose?

sethmlarson commented 1 year ago

I've updated the records to remove the ECOSYSTEM type: https://github.com/psf/advisory-database/pull/21

andrewpollock commented 1 year ago

Would listing out the tags in "affected[].versions" work for that purpose?

https://ossf.github.io/osv-schema/#affectedversions-field

OSV.dev, as part of the importing process will do its best to enumerate the versions between the introduced and fixed (and soon to be last_affected) GIT ranges. I'd have to go back and review the code involved to refresh my understanding of how it treats a pre-supplied affected[].versions field.

oliverchang commented 1 year ago

I can remove the ecosystem version ranges, it would be good to somehow capture the correct affected versions though. Would listing out the tags in "affected[].versions" work for that purpose?

This would work. Our own tag enumeration would only add new tags that we match that don't already exist to the original record. We don't ever delete things already in affected[].versions.

sethmlarson commented 1 year ago

@oliverchang I talked with Release Manager Łukasz and he mentioned that while branches are deleted all of the commits are kept in the repo, so in theory by cloning the entire repository all of the tags should be discoverable. Maybe this is causing an issue in the discovery process?

oliverchang commented 1 year ago

@oliverchang I talked with Release Manager Łukasz and he mentioned that while branches are deleted all of the commits are kept in the repo, so in theory by cloning the entire repository all of the tags should be discoverable. Maybe this is causing an issue in the discovery process?

Thanks for checking! Yep, this is what we will be trying as part of https://github.com/google/osv.dev/issues/1590

oliverchang commented 1 year ago

These are now ingested in our prod instance. e.g. https://osv.dev/vulnerability/PSF-2019-7

https://github.com/google/osv.dev/issues/1590 tracks the remaining git enumeration improvement

sethmlarson commented 1 year ago

Awesome, thanks @oliverchang and @andrewpollock! :muscle: