[Spike] Scale OSSF Scorecards prescriptions out of GitHub for aggregation by revision

mayaCostantini commented 2 years ago

Is your feature request related to a problem? Please describe. As we will start aggregating Scorecards prescriptions by project repository revision as present in the new scorecards-v2 BigQuery dataset and possibly create those prescriptions for packages from other ecosystems, we should think about a more scalable solution to have this data available. The current size of the prescriptions dataset is currently of ~500M, which will largely exceed the recommended GitHub limit of 5GiB for a repository and cause storage and performance issues.

Describe the solution you'd like Set up a new database (possibly non-relational) or make new Scorecards prescriptions available in a S3 bucket accessed through a webservice.

Additional context Related to https://github.com/thoth-station/core/issues/440

mayaCostantini commented 2 years ago

/sig stack-guidance /priority important-soon

mayaCostantini commented 2 years ago

Needs https://github.com/thoth-station/prescriptions-refresh-job/issues/195

thoth-station / prescriptions

[Spike] Scale OSSF Scorecards prescriptions out of GitHub for aggregation by revision #31968