thoth-station / thoth-application

Thoth-Station ArgoCD Applications
GNU General Public License v3.0
12 stars 22 forks source link

sprint production release for 2021.09.27 #1961

Closed harshad16 closed 2 years ago

harshad16 commented 2 years ago

Hello, Thoth-station!

This Issue would be used for the current sprint cycle production release. By the end of the sprint cycle, we will consolidate the information of thoth-station components features upgrade and fixes in this issue.

harshad16 commented 2 years ago

/kind documentation /area release-eng /triage accepted /milestone 2021.09.27

fridex commented 2 years ago

Optimizations in prescriptions loading in a deployment

As our open database about Python projects (thoth-station/prescriptions) grew, we observed large overhead needed to handle it in a raw form (YAML files) in a deployment. As of now we have ~50+k YAML files that result in ~190+MiB that were copied in the deployment to the recommendation engine so that they could be used. Moreover, the YAML text format introduced overhead in parsing the files and creating resolution pipeline units out of them. With recent changes, we pre-load all the YAML files and construct a binary format holding directly Python objects on each adviser component release. The binary file (pickle) is directly loaded into memory in the cloud-based resolver on each request. This significantly speeds up prescriptions loading in deployment and allows the open database of observations for Python ecosystem grow even more. Originally, the whole prescription overhead (prescriptions loading, parsing, initializing) was roughly 1minute (+container pull), now the overhead is roughly 0.01 second.

Related: https://github.com/thoth-station/prescriptions/issues/50 Related: https://github.com/thoth-station/adviser/pull/2085 Related: https://github.com/thoth-station/thoth-application/pull/1950

fridex commented 2 years ago

CVE update job rewritten to OSV 0.8 format

PyPA upstream rewritten the advisory-db to conform to OSV 0.8 format. We adopted this format by updating the logic of our cve-update-job.

Related: https://github.com/thoth-station/cve-update-job/pull/424 Related: https://github.com/pypa/advisory-db/commit/7872b0a91b4d980f749e6d75a81f8cc1af32829f

fridex commented 2 years ago

Open Source Security Foundation - Security Scorecard

Starting this release, we are providing information derived out of OpenSSF Security Scorecards as provided by the Open Source Security Foundation. To support this, prescriptions-refresh-job queries scorecards available in BigQuery and constructs prescriptions out of them which are directly committed to the thoth-station/prescriptions repository - open database about Python open-source projects. As scorecards are specific to repositories, they are automatically mapped to python packages under the hood, based on Thoth's knowledge The prescriptions refresh job automatically updates prescriptions as scorecards get updated. See relevant prescriptions for flask or tensorflow for examples.

pacospace commented 2 years ago

Thoth tutorial: create and use overlays for Elyra AI Pipelines steps.

This tutorial is used to show the concept of overlays, how they are applied to software stacks, what are overlays builds and the use of the built images in AI Pipelines.

Related:

fridex commented 2 years ago

Warning produced if users use forked projects on GitHub

As part of data aggregation done in prescriptions-refresh-job, we are aggregating information whether the given project is a fork. If so, users are warned about the use of forked projects. Prescriptions are automatically updated as the GitHub state changes.

Example: https://github.com/thoth-station/prescriptions/pull/17217

fridex commented 2 years ago

Warn if a package used was not updated in the last 365 days

As part of data aggregation done in prescriptions-refresh-job, we are aggregating information on whether the given project was updated in the past 365 days on GitHub (any commit to the default Git branch). If not, we warn users about this fact. Prescriptions are automatically updated as the GitHub state changes.

Example: https://github.com/thoth-station/prescriptions/pull/17394/files

fridex commented 2 years ago

Warn if the used project has less than 5 contributors on GitHub

As part of data aggregation done in prescriptions-refresh-job, we are aggregating information on the number of contributors. If the number of contributors is less than 3, we warn users about this fact. Prescriptions are automatically updated as the GitHub state changes.

fridex commented 2 years ago

Information about GitHub popularity

When users ask for advise, starting this release we provide also information about the community on GitHub. If a project has small community, we warn about its use, otherwise we provide information on how big the community is. Prescriptions are automatically updated as the GitHub state changes.

Example (very high popularity): flask Example (high popularity): wheel Example (moderate popularity): configparser Example (low popularity): untokenize

harshad16 commented 2 years ago

we have completed the release of 2021.09.27 :tada: :confetti_ball: :partying_face:

Features

Optimizations in prescriptions loading in a deployment

As our open database about Python projects (thoth-station/prescriptions) grew, we observed large overhead needed to handle it in a raw form (YAML files) in a deployment. As of now we have ~50+k YAML files that result in ~190+MiB that were copied in the deployment to the recommendation engine so that they could be used. Moreover, the YAML text format introduced overhead in parsing the files and creating resolution pipeline units out of them. With recent changes, we pre-load all the YAML files and construct a binary format holding directly Python objects on each adviser component release. The binary file (pickle) is directly loaded into memory in the cloud-based resolver on each request. This significantly speeds up prescriptions loading in deployment and allows the open database of observations for Python ecosystem grow even more. Originally, the whole prescription overhead (prescriptions loading, parsing, initializing) was roughly 1minute (+container pull), now the overhead is roughly 0.01 second.

Related: https://github.com/thoth-station/prescriptions/issues/50 Related: https://github.com/thoth-station/adviser/pull/2085 Related: https://github.com/thoth-station/thoth-application/pull/1950

Open Source Security Foundation - Security Scorecard

Starting this release, we are providing information derived out of OpenSSF Security Scorecards as provided by the Open Source Security Foundation. To support this, prescriptions-refresh-job queries scorecards available in BigQuery and constructs prescriptions out of them which are directly committed to the thoth-station/prescriptions repository - open database about Python open-source projects. As scorecards are specific to repositories, they are automatically mapped to python packages under the hood, based on Thoth's knowledge The prescriptions refresh job automatically updates prescriptions as scorecards get updated. See relevant prescriptions for flask or tensorflow for examples.

Thoth tutorial: create and use overlays for Elyra AI Pipelines steps.

This tutorial is used to show the concept of overlays, how they are applied to software stacks, what are overlays builds and the use of the built images in AI Pipelines.

Related:

Warning produced if users use forked projects on GitHub

As part of data aggregation done in prescriptions-refresh-job, we are aggregating information whether the given project is a fork. If so, users are warned about the use of forked projects. Prescriptions are automatically updated as the GitHub state changes.

Example: https://github.com/thoth-station/prescriptions/pull/17217

Warn if a package used was not updated in the last 365 days

As part of data aggregation done in prescriptions-refresh-job, we are aggregating information on whether the given project was updated in the past 365 days on GitHub (any commit to the default Git branch). If not, we warn users about this fact. Prescriptions are automatically updated as the GitHub state changes.

Example: https://github.com/thoth-station/prescriptions/pull/17394/files

Warn if the used project has less than 5 contributors on GitHub

As part of data aggregation done in prescriptions-refresh-job, we are aggregating information on the number of contributors. If the number of contributors is less than 3, we warn users about this fact. Prescriptions are automatically updated as the GitHub state changes.

Information about GitHub popularity

When users ask for advise, starting this release we provide also information about the community on GitHub. If a project has small community, we warn about its use, otherwise we provide information on how big the community is. Prescriptions are automatically updated as the GitHub state changes.

Example (very high popularity): flask Example (high popularity): wheel Example (moderate popularity): configparser Example (low popularity): untokenize

Component Updates

Thanks for the amazing work everyone. :100: