vipyrsec / dragonfly

A combined C2 and malware scanning service focused on the early identification, analysis, and reporting of malicious packages on the Python Package Index
MIT License
0 stars 0 forks source link

Metadata Tracking #3

Closed import-pandas-as-numpy closed 1 year ago

import-pandas-as-numpy commented 1 year ago

When a package is labeled as malicious, the name, author, and sha256/md5 checksums should be generated and written to a database, along with the time of the observation.

shenanigansd commented 1 year ago

We can dump all the details provided by https://github.com/mantissecurity/dragonfly.mantissecurity.org/pull/29 in a JSONB column.

Isn't the full metadata still available on PyPI though? We don't want to duplicated too much data if it's all reproducible.

import-pandas-as-numpy commented 1 year ago

It would still be beneficial if we tracked certain key metadata components locally for comprehensive metrics on packages. What I don't want to see happening is that we update some sort of dashboard and fire off 1000+ JSON queries to hit the Warehouse API.

Furthermore, the more we integrate metadata metrics, the more we can take that into consideration for package evaluation. I'd like to start poking around to see how many releases a package has and take that into account on our decisions, but that's a 'down the road' discussion.