Open pombredanne opened 4 months ago
Hi @pombredanne
I was going through this feature and would like to work on it. Before that I would like to know if we have any version API to get current version of the service deployed.
We can use scancodeio_version
from run
table in SCIO, in purlDB we need to use the purlDB version also stored in purlDB in the database and we should introduce new process for re indexing which is similar to request_scans
.
In this we can have a function like below
def check_for_reindex(package, current_scancode_version, current_purldb_version, current_pipelines):
if (package.scancode_version != current_scancode_version or
package.purldb_version != current_purldb_version or
package.pipelines_used != current_pipelines):
print(f"Package {package.name} needs reindexing.")
else:
print(f"No reindexing needed for package {package.name}.")
@pombredanne let me know if we can implement it along the same lines.
We need to track the code version and pipelines used to index a package. Why? As we are updating the purldb with new features we may have indexed packages with various version of the tools and all its dependency tree. We need to decide when to rerun an indexing scan.
We have many ways to solve this:
do nothing and just reindex/recollect single things or everything as needed as is done today. This is possibly problematic at our scale
track which version of ScanCode.io, purldb and which pipelines has been used.
This would then be used to decide if a package needs reindexing given the current context of SCIO version, pipeline and so on. If different from the current one, we would need reindexing/recollecting.
See also https://github.com/nexB/purldb/issues/221