thoth-station / core

Using Artificial Intelligence to analyse and recommend Software Stacks for Artificial Intelligence applications.
https://thoth-station.github.io/
GNU General Public License v3.0
28 stars 25 forks source link

[Spike] [MVP] Package maintenance predictive model #444

Open mayaCostantini opened 1 year ago

mayaCostantini commented 1 year ago

Problem statement

While most approaches focus on guaranteeing the provenance of software components, this is only one side of sustainable software development. One other side is the focus on software components which are critical to the success of the whole software system, its development and delivery/operation.

cc @goern

As Python developer, I would like to be able to predict if some of my dependencies will go unmaintained with time.

The idea would be to develop a learning model able when a given package will go under an acceptable level of maintenance that could be defined by the user or directly in the model, in an arbitrary way. A PoC for this model could use project maintenance data as provided by the OpenSSF Security Scorecards, given that the upstream project implements Scorecard checks per package version instead of updating Scorecards check given the project repository last commit SHA.

Proposal description

  1. Provide a PoC of a model trained on the Scorecards dataset (with Scorecard checks per package version) capable to predict from which version a package is susceptible to go under a predefined level of maintenance. A good candidate for this task could be a Multiple Linear Regression, given that MLR assumptions (linear relationship between predictive and response variables, predictive variables are not too correlated, etc) are validated. Other supervised learning models could also be considered.
  1. Find relevant integrations for the model

Think about ways to provide this model as a service, and where in a Python project lifecycle it would be most relevant for developers to predict the maintenance duration of their dependencies.

Acceptance Criteria

To be defined.

sesheta commented 1 year ago

@mayaCostantini: This issue is currently awaiting triage. If a refinement session determines this is a relevant issue, it will accept the issue by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
mayaCostantini commented 1 year ago

/priority important-longterm /sig stack-guidance