oss-review-toolkit / ort

A suite of tools to automate software compliance checks.
https://oss-review-toolkit.org
Apache License 2.0
1.61k stars 310 forks source link

Consolidate Scan Storages #8721

Open mnonnenmacher opened 5 months ago

mnonnenmacher commented 5 months ago

The scanner supports storing multiple types of data in storages for reuse in subsequents runs or other tools:

Currently the storage backends can be configured separately for each of those four data types. While this is very flexible, in practice it provides little value. For example, if scan results are stored in a Postgres database, there is little reason to store provenance results in a different place. Or if file archives are stored in S3, there is little reason to store file lists in a different place.

This flexibility makes the configuration complex: The default settings store all data in a local directory which is usually not desired in a production setup, so to store the data remotely four storage backend configurations are required. This often confuses users and can also cause performance issues for users not knowing how the scanner works internally, for example, by forgetting to configure a provenance storage which leads to unnecessary repetition of the provenance resolution.

To simplify the configuration, the proposal is to consolidate the configuration to just two types of data:

The implementation proposal is:

sschuberth commented 5 months ago

For reference, this also relates to the pending draft at https://github.com/oss-review-toolkit/ort/pull/5516 which tries to address the storage configuration complexity by centralizing and reusing it.

sschuberth commented 5 months ago

To simplify the configuration, the proposal is to consolidate the configuration to just two types of data:

That distinction makes sense to me. And maybe the interfaces should be named accordingly: For binary data that's already the case, but maybe ScanStorage should be more generally StructuredStorage. Thinking about it, maybe BlobStorage would then be a better contrast to that than BinaryStorage.