RFC: analysis result caching

oss-review-toolkit / ort

A suite of tools to automate software compliance checks.

https://oss-review-toolkit.org

Apache License 2.0

1.61k stars 309 forks source link

RFC: analysis result caching #5186

Closed fviernau closed 4 months ago

fviernau commented 2 years ago

The result of the analysis can change for the following reasons:

A first level dependency has been added / removed / changed
The version constraints resolve differently
- versions not fixed, new release of a (transitive) dependency
- tooling update, change in the heuristic to resolve versions
change in the ordered list of artifact repositories

So, dependency trees may change between two analyzer runs for the exact same source tree. In order to seed-up the average analysis duration (for CI/CD), the analysis result could be cached. Therefore it seems like 1. and 3. could be used as cache key, roughly speaking: If first level dependencies and repositories didn't change, then use the result from the cache if it doesn't the entries' age doesn't exceed a configured max age.

sschuberth commented 2 years ago

If first level dependencies and repositories didn't change, then use the result from the cache

How does that guard against your 2a) case?

Also, for analyzers that use CLI tools, a different version of that tool might have an effect on the version resolution.

fviernau commented 2 years ago

How does that guard against your 2a) case?

It doesn't. My thoughts were: when reviewing compliance you need to define how old your analyzer result can be at most. I guessed that one would say something like: "if (direct) dependencies didn't change, then the analysis can be X amount of time old". I proposed to translate X then into the max cache age and point #1 and #2 into cache key. This idea could be too specific and a more generic approach could be needed.

Also, for analyzers that use CLI tools, a different version of that tool might have an effect on the version resolution.

Right, would it fit into 2.b?

sschuberth commented 2 years ago

Also, for analyzers that use CLI tools, a different version of that tool might have an effect on the version resolution.

Right, would it fit into 2.b?

Yes.

sschuberth commented 2 years ago

I generally like the idea of analyzer result caching, but I wonder whether we should limit ourselves to simple cases first, e.g. cases where a lockfile is present, and simply use the hash of the lockfile as the key for cache lookup.

sschuberth commented 4 months ago

Maybe another option could be to look into the direction of https://github.com/oss-review-toolkit/ort/issues/8361.

sschuberth commented 4 months ago

Closed as part of backlog grooming. Feel free to comment if you would like to contribute to this.