Open mnonnenmacher opened 5 months ago
For reference, this also relates to the pending draft at https://github.com/oss-review-toolkit/ort/pull/5516 which tries to address the storage configuration complexity by centralizing and reusing it.
To simplify the configuration, the proposal is to consolidate the configuration to just two types of data:
That distinction makes sense to me. And maybe the interfaces should be named accordingly: For binary data that's already the case, but maybe ScanStorage
should be more generally StructuredStorage
. Thinking about it, maybe BlobStorage
would then be a better contrast to that than BinaryStorage
.
The scanner supports storing multiple types of data in storages for reuse in subsequents runs or other tools:
Currently the storage backends can be configured separately for each of those four data types. While this is very flexible, in practice it provides little value. For example, if scan results are stored in a Postgres database, there is little reason to store provenance results in a different place. Or if file archives are stored in S3, there is little reason to store file lists in a different place.
This flexibility makes the configuration complex: The default settings store all data in a local directory which is usually not desired in a production setup, so to store the data remotely four storage backend configurations are required. This often confuses users and can also cause performance issues for users not knowing how the scanner works internally, for example, by forgetting to configure a provenance storage which leads to unnecessary repetition of the provenance resolution.
To simplify the configuration, the proposal is to consolidate the configuration to just two types of data:
The implementation proposal is:
ScanStorage
and all related classes toScanResultStorage
ScanStorage
was chosen becauseScanResultStorage
was already taken, but this is not the case anymore.ScanStorage
BinaryStorage
PostgresScanStorage
usesProvenanceBasedPostgresStorage
,PostgresPackageProvenanceStorage
andPostgresNestedProvenanceStorage
.MariaDbScanStorage
should not only provide a way to store scan results, but also to store provenance resolution results.