Closed import-pandas-as-numpy closed 1 year ago
We can dump all the details provided by https://github.com/mantissecurity/dragonfly.mantissecurity.org/pull/29 in a JSONB column.
Isn't the full metadata still available on PyPI though? We don't want to duplicated too much data if it's all reproducible.
It would still be beneficial if we tracked certain key metadata components locally for comprehensive metrics on packages. What I don't want to see happening is that we update some sort of dashboard and fire off 1000+ JSON queries to hit the Warehouse API.
Furthermore, the more we integrate metadata metrics, the more we can take that into consideration for package evaluation. I'd like to start poking around to see how many releases a package has and take that into account on our decisions, but that's a 'down the road' discussion.
When a package is labeled as malicious, the name, author, and sha256/md5 checksums should be generated and written to a database, along with the time of the observation.