oss-review-toolkit / ort

A suite of tools to automate software compliance checks.
https://oss-review-toolkit.org
Apache License 2.0
1.57k stars 308 forks source link

Cache is Read Despite the Scanner Config Is Different #8135

Open dimitris-iliou opened 8 months ago

dimitris-iliou commented 8 months ago

Hello, I am facing a cache issue, and I would like your help. In our provenance_scan_results table there are some new duplicate scan results which share the same vcs_url and vcs_revision , but different scanner_configuration and scan_summary.

ORT started failing with the following message:

17:45:25.573 [main] WARN  Exposed - Transaction attempt #2 failed: org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint “provenance_scan_results_artifact_url_artifact_hash_scanner_name”
7529  Detail: Key (artifact_url, artifact_hash, scanner_name, scanner_version, scanner_configuration)=(https://registry.npmjs.org/@types/webpack/-/webpack-4.41.32.tgz, a7bab03b72904070162b2f169415492209e94212, ScanCode, 32.0.6, --copyright --license --info --strip-root --timeout 300 --json-pp) already exists.. Statement(s): INSERT INTO provenance_scan_results (artifact_hash, artifact_url, scan_summary, scanner_configuration, scanner_name, scanner_version, vcs_revision, vcs_type, vcs_url) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
7530org.jetbrains.exposed.exceptions.ExposedSQLException: org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint “provenance_scan_results_artifact_url_artifact_hash_scanner_name”

So, it looks like the cache is read despite the scanner config is different. Is this behavior intentional, or could it be considered a bug?

fviernau commented 8 months ago

@dimitris-iliou the above ERROR log message does not result from reading the cache but from writing. Anyhow, I suppose you haven't changed the database url and scheme in you config, so the scan results for different scanner configurations are stored in the same table. It is expected that scan results for different scanner configurations are stored in the same table.

The constraint violation might be a bug though. It would be interesting to know 1. the scan result which is being inserted 2. the scan result which already exists and is in conflict. Are these identical, or is the part in which they differ not included in the unique constraint key?

dimitris-iliou commented 8 months ago
 Detail: Key (artifact_url, artifact_hash, scanner_name, scanner_version, scanner_configuration)=(https://registry.npmjs.org/@types/scheduler/-/scheduler-0.16.2.tgz, 1a62f89525723dde24ba1b01b092bf5df8ad4d39, ScanCode, 32.0.6, --copyright --license --info --strip-root --timeout 300 --json-pp) already exists.. Statement(s): INSERT INTO provenance_scan_results (artifact_hash, artifact_url, scan_summary, scanner_configuration, scanner_name, scanner_version, vcs_revision, vcs_type, vcs_url) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
org.jetbrains.exposed.exceptions.ExposedSQLException: org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint "provenance_scan_results_artifact_url_artifact_hash_scanner_name"
  Detail: Key (artifact_url, artifact_hash, scanner_name, scanner_version, scanner_configuration)=(https://registry.npmjs.org/@types/scheduler/-/scheduler-0.16.2.tgz, 1a62f89525723dde24ba1b01b092bf5df8ad4d39, ScanCode, 32.0.6, --copyright --license --info --strip-root --timeout 300 --json-pp) already exists.
    at org.jetbrains.exposed.sql.statements.Statement.executeIn$exposed_core(Statement.kt:94)
    at org.jetbrains.exposed.sql.Transaction.exec(Transaction.kt:209)
    at org.jetbrains.exposed.sql.Transaction.exec(Transaction.kt:186)
    at org.jetbrains.exposed.sql.statements.Statement.execute(Statement.kt:55)
    at org.jetbrains.exposed.sql.QueriesKt.insert(Queries.kt:71)
    at org.ossreviewtoolkit.scanner.storages.ProvenanceBasedPostgresStorage$write$1.invoke(ProvenanceBasedPostgresStorage.kt:142)
    at org.ossreviewtoolkit.scanner.storages.ProvenanceBasedPostgresStorage$write$1.invoke(ProvenanceBasedPostgresStorage.kt:141)
    at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt.inTopLevelTransaction$run(ThreadLocalTransactionManager.kt:275)
    at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt.access$inTopLevelTransaction$run(ThreadLocalTransactionManager.kt:1)
    at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt$inTopLevelTransaction$1.invoke(ThreadLocalTransactionManager.kt:322)
    at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt.keepAndRestoreTransactionRefAfterRun(ThreadLocalTransactionManager.kt:330)
    at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt.inTopLevelTransaction(ThreadLocalTransactionManager.kt:321)
    at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt$transaction$1.invoke(ThreadLocalTransactionManager.kt:230)
    at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt.keepAndRestoreTransactionRefAfterRun(ThreadLocalTransactionManager.kt:330)
    at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt.transaction(ThreadLocalTransactionManager.kt:200)
    at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt.transaction(ThreadLocalTransactionManager.kt:178)
    at org.ossreviewtoolkit.model.utils.DatabaseUtils.transaction(DatabaseUtils.kt:119)
    at org.ossreviewtoolkit.scanner.storages.ProvenanceBasedPostgresStorage.write(ProvenanceBasedPostgresStorage.kt:141)
    at org.ossreviewtoolkit.scanner.Scanner.storeProvenanceScanResult(Scanner.kt:631)
    at org.ossreviewtoolkit.scanner.Scanner.runPathScanners(Scanner.kt:454)
    at org.ossreviewtoolkit.scanner.Scanner.scan(Scanner.kt:178)
    at org.ossreviewtoolkit.scanner.Scanner$scan$3.invokeSuspend(Scanner.kt)
    at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
    at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:108)
    at kotlinx.coroutines.EventLoopImplBase.processNextEvent(EventLoop.common.kt:280)
    at kotlinx.coroutines.BlockingCoroutine.joinBlocking(Builders.kt:85)
    at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking(Builders.kt:59)
    at kotlinx.coroutines.BuildersKt.runBlocking(Unknown Source)
    at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking$default(Builders.kt:38)
    at kotlinx.coroutines.BuildersKt.runBlocking$default(Unknown Source)
    at org.ossreviewtoolkit.plugins.commands.scanner.ScannerCommand.runScanners(ScannerCommand.kt:227)
    at org.ossreviewtoolkit.plugins.commands.scanner.ScannerCommand.run(ScannerCommand.kt:140)
    at com.github.ajalt.clikt.parsers.Parser.parse(Parser.kt:306)
    at com.github.ajalt.clikt.parsers.Parser.parse(Parser.kt:319)
    at com.github.ajalt.clikt.parsers.Parser.parse(Parser.kt:40)
    at com.github.ajalt.clikt.core.CliktCommand.parse(CliktCommand.kt:458)
    at com.github.ajalt.clikt.core.CliktCommand.parse$default(CliktCommand.kt:455)
    at com.github.ajalt.clikt.core.CliktCommand.main(CliktCommand.kt:475)
    at com.github.ajalt.clikt.core.CliktCommand.main(CliktCommand.kt:482)
    at org.ossreviewtoolkit.cli.OrtMainKt.main(OrtMain.kt:85)
Caused by: org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint "provenance_scan_results_artifact_url_artifact_hash_scanner_name"

Here are the scan results from one of the packages that throw that error:

{
      "provenance" : {
        "source_artifact" : {
          "url" : "https://registry.npmjs.org/@types/scheduler/-/scheduler-0.16.2.tgz",
          "hash" : {
            "value" : "1a62f89525723dde24ba1b01b092bf5df8ad4d39",
            "algorithm" : "SHA-1"
          }
        }
      },
      "scanner" : {
        "name" : "ScanCode",
        "version" : "32.0.6",
        "configuration" : "--copyright --license --info --strip-root --timeout 300 --json-pp"
      },
      "summary" : {
        "start_time" : "2024-01-12T08:51:03.000278212Z",
        "end_time" : "2024-01-12T08:51:06.000004181Z",
        "licenses" : [ {
          "license" : "MIT",
          "location" : {
            "path" : "scheduler/LICENSE",
            "start_line" : 1,
            "end_line" : 1
          },
          "score" : 100.0
        }, {
          "license" : "MIT",
          "location" : {
            "path" : "scheduler/LICENSE",
            "start_line" : 5,
            "end_line" : 21
          },
          "score" : 100.0
        }, {
          "license" : "MIT",
          "location" : {
            "path" : "scheduler/package.json",
            "start_line" : 6,
            "end_line" : 6
          },
          "score" : 100.0
        } ],
        "copyrights" : [ {
          "statement" : "Copyright (c) Microsoft Corporation",
          "location" : {
            "path" : "scheduler/LICENSE",
            "start_line" : 3,
            "end_line" : 3
          }
        } ]
      }

and here are the data from our cache regarding the above package existed_scan_results.csv

Let me know if that helps