ossf / package-feeds

Feed parsing for language package manager updates
Apache License 2.0
71 stars 24 forks source link

Aggregation of distro and pkg data sets to create a searchable DB #203

Open TheFoxAtWork opened 2 years ago

TheFoxAtWork commented 2 years ago

Background: As more vulnerabilities to continue to be discovered in packages and libraries that are present in various distributions, practitioners working across their organizations need a single place to query for a particular dependency, package, or other component and discover which distributions and their version contain that (or vice versa).

Comparable Queries: The following is a variety of various tools or resources that have some functionality of a desired search tool (in order of best match to the use case described in the background):

Proposal: Ideally there'd be a single system which supports libraries.io and pkgs.org. pkgs.org API access requires a membership and may be worth the OpenSSF funding in order to query both APIs and bring them into a single too (or other financial offset to allow the pkgs.org API to be free). We should look to include as many distros in this central tool.

Why this issue on package-feeds?
It is unclear what is the best group to tackle this project, given package feed appears to have initial functionality, this issue is being submitted as best-possible-match for a home this could be created under or as an extension to.

For more information on the discussion that sparked this issue: https://openssf.slack.com/archives/C019M98JSHK/p1657119043352399

alilleybrinker commented 2 years ago

Commented in the thread, but repeating here:

I think a solution which incorporates existing systems, rather than building a new package-finding system from scratch, is definitely the ideal. It would also enable us to cast the widest net in supporting many different platforms (both OS and language package managers). That said, imagining a tool which queries multiple sources, we'd want to be clear in the UI where information is being sourced. So if a result for a package comes from, say, libraries.io, the end-user should be informed.

TheFoxAtWork commented 2 years ago

💯

bureado commented 2 years ago

Only tangentially related, https://github.com/ossf/wg-securing-critical-projects/issues/41. There is some overlap with the component/threat intelligence elements in certain commercial vendors/offerings, so it'd be interesting to ask members and commercial entities more broadly about this, too. Also https://ossindex.sonatype.org/, https://deps.dev/, and here's a list of by-hash links I collected a while ago over at https://github.com/bureado/awesome-software-supply-chain-security#dependency-intelligence:

It'd be good to model the query keys. Should we expect to pass a string, or a purl and it'll give us CPEs? Or a hash, or a filename and it gives us purls? Or will it help us normalize a partial search? See https://github.com/repology/repology-rules. And what kind of information about a package? For example, I don't think Repology would give us e.g., debtags or buildinfo files that we could bring in from dedicated Debian infrastructure, or even what the UDD does (I'm sure there are similar data sources for OBS, Koji, etc.)

Edit: forgot https://artifacthub.io/docs/topics/repositories/

scovetta commented 2 years ago

If helpful, you're welcome to leverage the logic (or implementation) we built into https://github.com/Microsoft/OSSGadget, which handles at least some of this abstraction.