oss-review-toolkit / ort

A suite of tools to automate software compliance checks.
https://oss-review-toolkit.org
Apache License 2.0
1.56k stars 306 forks source link

Split package curation data according to its use #6187

Open sschuberth opened 1 year ago

sschuberth commented 1 year ago

The current PackageCurationData class allows to curate semantically very different things at once:

  1. There are curation fields for almost every field of the Package class. These are technical curations, which are mostly needed to make the Downloader succeed.

  2. A small number of fields (concludedLicense, declaredLicenseMapping), however, have a different purpose as they have legal implications.

Technical vs. legal curations are typically performed by different types of users. So having both in the same file is confusing and hampers a workflow with strict separation of responsibilities (for example due to merge conflicts when managing curation files in Git for review).

Also, this mingling of concerns is an issue when trying to reduce turn-around times (i.e. rerunning the least amount of tools) for testing new curations without rerunning the Analyzer, as e.g. for isolated technical curations only the Downloader would need to rerun, and the Evaluator only for legal curations.

sschuberth commented 1 year ago

Having this split would the the first step toward untangling another unfortunate fusion of semantics:

Legal package curations overlap semantically with license finding curations from package configurations, which allow to conclude licenses on a per-finding basis, in sum implying a concluded license for the package.

Ideally, all ways to conclude a license for a package should be configured in the same place, no matter whether it is done explicitly by setting a concluded license for the package, or implicitly by concluding licenses for findings and / or eliminating false-positives.