nexB / purldb

Tools to create and expose a database of purls (Package URLs). This project is sponsored by NLnet project https://nlnet.nl/project/vulnerabilitydatabase/ and nexB for https://www.aboutcode.org/ Chat is at https://gitter.im/aboutcode-org/discuss
https://purldb.readthedocs.io/
29 stars 20 forks source link

Track code version and pipelines used to index a package #347

Open pombredanne opened 4 months ago

pombredanne commented 4 months ago

We need to track the code version and pipelines used to index a package. Why? As we are updating the purldb with new features we may have indexed packages with various version of the tools and all its dependency tree. We need to decide when to rerun an indexing scan.

We have many ways to solve this:

This would then be used to decide if a package needs reindexing given the current context of SCIO version, pipeline and so on. If different from the current one, we would need reindexing/recollecting.

See also https://github.com/nexB/purldb/issues/221

404-geek commented 3 months ago

Hi @pombredanne

I was going through this feature and would like to work on it. Before that I would like to know if we have any version API to get current version of the service deployed.

404-geek commented 2 months ago

We can use scancodeio_version from run table in SCIO, in purlDB we need to use the purlDB version also stored in purlDB in the database and we should introduce new process for re indexing which is similar to request_scans .

In this we can have a function like below

def check_for_reindex(package, current_scancode_version, current_purldb_version, current_pipelines):
    if (package.scancode_version != current_scancode_version or
            package.purldb_version != current_purldb_version or
            package.pipelines_used != current_pipelines):
        print(f"Package {package.name} needs reindexing.")
    else:
        print(f"No reindexing needed for package {package.name}.")

@pombredanne let me know if we can implement it along the same lines.