Advisories from GuardDog

christophetd commented 3 months ago

Hi there!

I'm part of Datadog, where we publish and keep up to date a dataset of human-confirmed malicious npm and PyPI packages: https://github.com/DataDog/malicious-software-packages-dataset/

Eager to discuss what would be the best way to bring that into osv.dev

andrewpollock commented 3 months ago

Hi @christophetd that's super cool, thank you for considering contributing to OSV.dev. I'm interested in how your data set compares and contrasts with that of the OpenSSF's Malicious Packages project?

The onboarding process is a little bit bespoke and toilsome at the moment, but something we're continuously improving on and streamlining with each new data source onboarded. I would like to get it to the point of being much more checklist/cookbook driven than it currently is. My detailed response here is an experiment at a further process improvement and seeks to address some recent actionable feedback received by another data source onboarding. Your actionable feedback is also very welcome.

In a nutshell:

[ ] Decide if you're going to publish records via a Git repository, GCS bucket or REST endpoint (I'm going to assume a Git repository?)
[ ] Create a PR to reserve a prefix in the OSV-Schema (worked examples: https://github.com/ossf/osv-schema/pull/235 https://github.com/ossf/osv-schema/pull/223 https://github.com/ossf/osv-schema/pull/219)
- [ ] We review the records you start publishing for OSV Schema correctness and quality (the work happening under the OSV Data Quality Program is also relevant here, as an FYI) as part of reviewing and merging that PR
[ ] Create a PR to extend purl_helpers.py (if appropriate)
[ ] Create a PR to start importing the records you are publishing into our test instance of OSV.dev and validate everything is working as intended there (worked example: https://github.com/google/osv.dev/pull/2086)
[ ] Create a PR to start importing the records you are publishing into our production environment (worked example: https://github.com/google/osv.dev/pull/2105)

Known onboarding rough edges:

the format of the source{,_test}.yaml files (hopefully the example PRs plus other existing entries will make this reasonably self-evident). Specifically, FYI, the value for type corresponds with those defined at https://github.com/google/osv.dev/blob/381f459de12e181447731beee9ba4b06a513c586/osv/models.py#L783-L787

calebbrown commented 3 months ago

Hi @christophetd - I work on the Malicious Packages repository.

Your dataset could be included in the Malicious Packages data set, and I would be happy to work with you on doing that.

One idea I had is that we could add a GitHub action to the repository that walks the packages and transforms them into OSV. We could then call the action from a workflow inside Malicious Packages to ingest the reports.

Let me know what you think

Caleb

andrewpollock commented 1 month ago

@calebbrown @christophetd have you connected off-issue to determine the most appropriate integration point for these advisories?

google / osv.dev

Advisories from GuardDog #2252