oss-review-toolkit / ort

A suite of tools to automate software compliance checks.
https://oss-review-toolkit.org
Apache License 2.0
1.59k stars 309 forks source link

Using ORT with the SPDX "package manager" only #4505

Open tardyp opened 3 years ago

tardyp commented 3 years ago

Here at Renault, most of our integration is made with git repositories aggregated with google repo or git submodules.

We do put tags on our releases, but we would like to test our mainlines before. We started to use a bit of conan, but we can't wait this process to finish to implement full open-source reviews.

At the moment we have been using scancode-toolkit to scan our code, and have some custom report aggregation on google datastudio. We are happy with that but are starting to hit the problems that ort.yml is resolving. (exclude, curation, resolutions, license-choices)

We are struggling to use the OSS review toolkit because it all starts with a package manager, and we don't have one.

We tried to generate a spdx BOM from our repo list, or to generate the intermediate analyser output, but then the downloader seems to insist in downloading a tag and will just not accept to use a mainline branch for scan input.

So if my analysis is good, we would need to features on ORT

We are willing to help on the development of those features as needed, but will probably need some architecture help to make sure we are making the right change

Thanks for your inputs!

sschuberth commented 3 years ago

We are struggling to use the OSS review toolkit because it all starts with a package manager, and we don't have one.

We tried to generate a spdx BOM from our repo list

Using the SPDX "package manager" as a fallback is actually the right approach in this case.

the downloader seems to insist in downloading a tag and will just not accept to use a mainline branch for scan input.

That should not be the case if you configure ORT to allow dynamic versions (which is disabled by default as it prevents deterministic scan results for a given revision).

We are willing to help on the development of those features as needed, but will probably need some architecture help to make sure we are making the right change

Glad to hear you're willing to contribute! For architecture help you might want to join the bi-weekly Developer Meeting. Ping me on Slack for an invite.

tsteenbe commented 3 years ago

@tardyp At HERE Technologies we have several google repo-based projects that we have been scanning with ORT for the past 5+ years. We call ORT's downloader to checkout the git-repo project in a scan and then pass the dir to the analyzer, scanner, evaluator and reporter. ORT will then report the the root package as Unmanaged but then we use package.spdx.yml to defined included OSS packages.

Happy to give you a demo (ping me on ORT slack/email me to arrange a date) and then we can discuss any needed features in detail.

tardyp commented 3 years ago

Hello @sschuberth and @tsteenbe , thanks for your kind answers.

I am happy to learn that actually my use cases are supported already. I'd like to keep the spdx input as our repo source are quite big and we have some tooling already to manage them which we would like to reuse.

So I have still been struggling to generate a spdx file that is accepted by ort analyser. I eventually just setup a simplified example for public discussion

SPDXID: SPDXRef-DOCUMENT
creationInfo:
  created: '2020-07-23T18:30:22Z'
  creators:
  - oss-scanner
  licenseListVersion: '3.9'
dataLicense: Apache
documentDescribes:
- SPDXRef-Package-proj1
documentNamespace: http://spdx.org/spdxdocs/spdx-document-proj1
name: proj1
packages:
- SPDXID: SPDXRef-Package-proj1
  copyrightText: Copyright (C) 2020 Example Inc.
  description: proj1
  downloadLocation: git+https://github.com/tardyp/proj1@main
  filesAnalyzed: false
  licenseConcluded: NOASSERTION
  licenseDeclared: NOASSERTION
  name: proj1
  versionInfo: main
- SPDXID: SPDXRef-Package-proj2
  copyrightText: Copyright (C) 2020 Example Inc.
  description: proj2
  downloadLocation: git+https://github.com/tardyp/proj2@main
  filesAnalyzed: false
  licenseConcluded: NOASSERTION
  licenseDeclared: NOASSERTION
  name: proj2
  versionInfo: main
relationships:
- relatedSpdxElement: SPDXRef-Package-proj1
  relationshipType: DEPENDENCY_OF
  spdxElementId: SPDXRef-Package-proj2
spdxVersion: SPDX-2.2

A first trivial remark: copyrightText is mandatory. I don't have this info, that's why I need a scan run. This should be optional

Then relationshipType has to be DEPENDENCY_OF. I wasn't able to find a relationship mode that actually translate the relationship to something that appears in analyzer-result.yml

last the resulting analyser do not take in account the downloadLocation of the main package, vcs info for that main package is empty.

repository:
  vcs:
    type: ""
    url: ""
    revision: ""
    path: ""
  vcs_processed:
    type: ""
    url: ""
    revision: ""
    path: ""
  config: {}
analyzer:
  start_time: "2021-09-30T19:11:44.896295Z"
  end_time: "2021-09-30T19:11:45.455184Z"
  environment:
    ort_version: "3ac5b46-dirty"
    java_version: "11.0.10"
    os: "Mac OS X"
    processors: 8
    max_memory: 2147483648
    variables:
      SHELL: "/bin/zsh"
      TERM: "xterm-256color"
      JAVA_HOME: "/Library/Java/JavaVirtualMachines/zulu-11.jdk/Contents/Home/"
      GOPATH: "/Users/<>/go"
    tool_versions: {}
  config:
    ignore_tool_versions: false
    allow_dynamic_versions: true
  result:
    projects:
    - id: "SpdxDocumentFile::proj1:main"
      definition_file_path: ""
      declared_licenses:
      - "NOASSERTION"
      declared_licenses_processed:
        spdx_expression: "NOASSERTION"
      vcs:
        type: ""
        url: ""
        revision: ""
        path: ""
      vcs_processed:
        type: ""
        url: ""
        revision: ""
        path: ""
      homepage_url: ""
      scope_names:
      - "default"
    packages:
    - package:
        id: "SpdxDocumentFile::proj2:main"
        purl: "pkg:spdxdocumentfile/proj2@main"
        declared_licenses:
        - "NOASSERTION"
        declared_licenses_processed:
          spdx_expression: "NOASSERTION"
        description: "proj2"
        homepage_url: ""
        binary_artifact:
          url: ""
          hash:
            value: ""
            algorithm: ""
        source_artifact:
          url: ""
          hash:
            value: ""
            algorithm: ""
        vcs:
          type: "Git"
          url: "https://github.com/tardyp/proj2"
          revision: "main"
          path: ""
        vcs_processed:
          type: "Git"
          url: "https://github.com/tardyp/proj2.git"
          revision: "main"
          path: ""
      curations: []
    dependency_graphs:
      SpdxDocumentFile:
        packages:
        - "SpdxDocumentFile::proj2:main"
        scopes:
          :proj1:main:default:
          - root: 0
        nodes:
        - {}
        edges: []
    has_issues: false
scanner: null
advisor: null
evaluator: null

The scanner then will fail to scan anything as it can't find the source of the main package.

thoughts?

sschuberth commented 3 years ago

A first trivial remark: copyrightText is mandatory. I don't have this info, that's why I need a scan run. This should be optional

That's not actually up to us. The SPDX spec documents this to be a required field; however, you can set it to NOASSERTION (or NONE).

I wasn't able to find a relationship mode that actually translate the relationship to something that appears in analyzer-result.yml

DEPENDENCY_OF should work just fine. Very similarly, our test project declares a TEST_DEPENDENCY_OF like

https://github.com/oss-review-toolkit/ort/blob/57dc97a2a98f5a650923c36dcf5bba3f3ab577aa/analyzer/src/funTest/assets/projects/synthetic/spdx/project/project.spdx.yml#L70-L72

which results in

https://github.com/oss-review-toolkit/ort/blob/57dc97a2a98f5a650923c36dcf5bba3f3ab577aa/analyzer/src/funTest/assets/projects/synthetic/spdx-project-expected-output.yml#L28-L30

last the resulting analyser do not take in account the downloadLocation of the main package, vcs info for that main package is empty.

Indeed. So far our assumption / convention was that the SPDX file describing the project lies next to the project itself, so we were deducing the (processed) VCS information from the VCS working tree the SPDX file resides in. I'll look into what we can do to improve this.

sschuberth commented 3 years ago

I'll look into what we can do to improve this.

See https://github.com/oss-review-toolkit/ort/pull/4547.

tardyp commented 3 years ago

Hi @sschuberth thanks for the patch! I was able to retrieve the master branch and got the expected analyser output.

Now I am stuck in the scan phase.

ort  -P ort.analyzer.allowDynamicVersions=true --debug scan -i out/analyzer-result.yml -o out/scan

The downloader still insist in getting a tag to start the scan as I understand in the following log:

10:21:12.271 [main] INFO  org.ossreviewtoolkit.downloader.Downloader - Trying to download 'SpdxDocumentFile::proj2:main' sources to '/<tmp>/ort-ScanCode6080189846737807840/SpdxDocumentFile/unknown/proj2/main' from VCS...
10:21:12.272 [main] INFO  org.ossreviewtoolkit.downloader.Downloader - Using processed VcsInfo(type=Git, url=https://github.com/tardyp/proj2.git, revision=main, path=). Original was VcsInfo(type=Git, url=https://github.com/tardyp/proj2, revision=main, path=).
10:21:12.299 [main] INFO  org.ossreviewtoolkit.utils.OrtAuthenticator - Authenticator was successfully installed.
10:21:12.306 [main] INFO  org.ossreviewtoolkit.utils.OrtProxySelector - Proxy selector was successfully installed.
10:21:12.349 [main] INFO  org.ossreviewtoolkit.utils.OrtAuthenticator - Authenticator is already installed.
10:21:12.349 [main] INFO  org.ossreviewtoolkit.utils.OrtProxySelector - Proxy selector is already installed.
10:21:12.359 [main] INFO  org.ossreviewtoolkit.downloader.Downloader - Detected VCS type 'Git' from type name 'Git'.
10:21:12.385 [main] DEBUG org.eclipse.jgit.util.SystemReader - loading config FileBasedConfig[/<home>/.config/jgit/config]
10:21:12.387 [main] DEBUG org.eclipse.jgit.util.FS - readpipe [/opt/homebrew/bin/git, --version],/opt/homebrew/bin
10:21:12.394 [main] DEBUG org.eclipse.jgit.util.FS - readpipe may return 'git version 2.30.0'
10:21:12.394 [main] DEBUG org.eclipse.jgit.util.FS - remaining output:

10:21:12.395 [main] DEBUG org.eclipse.jgit.util.FS - readpipe [/opt/homebrew/bin/git, config, --system, --edit],/opt/homebrew/bin
10:21:12.400 [main] DEBUG org.eclipse.jgit.util.FS - readpipe may return '/opt/homebrew/etc/gitconfig'
10:21:12.400 [main] DEBUG org.eclipse.jgit.util.FS - remaining output:

10:21:12.401 [main] DEBUG org.eclipse.jgit.util.SystemReader - loading config FileBasedConfig[/opt/homebrew/etc/gitconfig]
10:21:12.401 [main] DEBUG org.eclipse.jgit.util.SystemReader - loading config FileBasedConfig[/<home>/.gitconfig]
10:21:12.421 [main] DEBUG org.eclipse.jgit.util.FS - Thread[main,5,main]: cannot measure timestamp resolution of unborn directory /<tmp>/ort-ScanCode6080189846737807840/SpdxDocumentFile/unknown/proj2/main/.git
10:21:12.433 [main] DEBUG org.eclipse.jgit.util.FS - Thread[main,5,main]: cannot measure timestamp resolution of unborn directory /<tmp>/ort-ScanCode6080189846737807840/SpdxDocumentFile/unknown/proj2/main/.git/refs/remotes
10:21:12.493 [main] WARN  org.eclipse.jgit.internal.transport.http.NetscapeCookieFile - Configured http.cookieFile '/<home>/.gitcookies' is missing
10:21:12.899 [main] DEBUG org.eclipse.jgit.transport.PacketLineIn - git< # service=git-upload-pack
10:21:12.899 [main] DEBUG org.eclipse.jgit.transport.PacketLineIn - git< 0000
10:21:12.920 [main] DEBUG org.eclipse.jgit.transport.PacketLineIn - git< version 2
10:21:12.922 [main] DEBUG org.eclipse.jgit.transport.PacketLineIn - git< agent=git/github-g20084f3c48a2
10:21:12.922 [main] DEBUG org.eclipse.jgit.transport.PacketLineIn - git< ls-refs
10:21:12.922 [main] DEBUG org.eclipse.jgit.transport.PacketLineIn - git< fetch=shallow filter
10:21:12.922 [main] DEBUG org.eclipse.jgit.transport.PacketLineIn - git< server-option
10:21:12.922 [main] DEBUG org.eclipse.jgit.transport.PacketLineIn - git< object-format=sha1
10:21:12.922 [main] DEBUG org.eclipse.jgit.transport.PacketLineIn - git< 0000
10:21:12.925 [main] DEBUG org.eclipse.jgit.transport.PacketLineOut - git> command=ls-refs
10:21:12.925 [main] DEBUG org.eclipse.jgit.transport.PacketLineOut - git> agent=JGit/5.13.0.202109080827-r
10:21:12.925 [main] DEBUG org.eclipse.jgit.transport.PacketLineOut - git> 0001
10:21:12.925 [main] DEBUG org.eclipse.jgit.transport.PacketLineOut - git> peel
10:21:12.925 [main] DEBUG org.eclipse.jgit.transport.PacketLineOut - git> symrefs
10:21:12.926 [main] DEBUG org.eclipse.jgit.transport.PacketLineOut - git> ref-prefix refs/heads/
10:21:12.926 [main] DEBUG org.eclipse.jgit.transport.PacketLineOut - git> 0000
10:21:13.080 [main] DEBUG org.eclipse.jgit.transport.PacketLineIn - git< fa42ed60a1dd8658d7c82390726e28d4481b977b refs/heads/main
10:21:13.081 [main] DEBUG org.eclipse.jgit.transport.PacketLineIn - git< 0000
10:21:13.081 [main] DEBUG org.eclipse.jgit.transport.PacketLineOut - git> 0000
10:21:13.082 [main] WARN  org.eclipse.jgit.internal.transport.http.NetscapeCookieFile - Configured http.cookieFile '/<home>/.gitcookies' is missing
10:21:13.318 [main] DEBUG org.eclipse.jgit.transport.PacketLineIn - git< # service=git-upload-pack
10:21:13.319 [main] DEBUG org.eclipse.jgit.transport.PacketLineIn - git< 0000
10:21:13.320 [main] DEBUG org.eclipse.jgit.transport.PacketLineIn - git< version 2
10:21:13.321 [main] DEBUG org.eclipse.jgit.transport.PacketLineIn - git< agent=git/github-g20084f3c48a2
10:21:13.321 [main] DEBUG org.eclipse.jgit.transport.PacketLineIn - git< ls-refs
10:21:13.321 [main] DEBUG org.eclipse.jgit.transport.PacketLineIn - git< fetch=shallow filter
10:21:13.321 [main] DEBUG org.eclipse.jgit.transport.PacketLineIn - git< server-option
10:21:13.321 [main] DEBUG org.eclipse.jgit.transport.PacketLineIn - git< object-format=sha1
10:21:13.321 [main] DEBUG org.eclipse.jgit.transport.PacketLineIn - git< 0000
10:21:13.321 [main] DEBUG org.eclipse.jgit.transport.PacketLineOut - git> command=ls-refs
10:21:13.321 [main] DEBUG org.eclipse.jgit.transport.PacketLineOut - git> agent=JGit/5.13.0.202109080827-r
10:21:13.321 [main] DEBUG org.eclipse.jgit.transport.PacketLineOut - git> 0001
10:21:13.321 [main] DEBUG org.eclipse.jgit.transport.PacketLineOut - git> peel
10:21:13.321 [main] DEBUG org.eclipse.jgit.transport.PacketLineOut - git> symrefs
10:21:13.322 [main] DEBUG org.eclipse.jgit.transport.PacketLineOut - git> ref-prefix refs/tags/
10:21:13.322 [main] DEBUG org.eclipse.jgit.transport.PacketLineOut - git> 0000
10:21:13.536 [main] DEBUG org.eclipse.jgit.transport.PacketLineIn - git< 0000
10:21:13.552 [main] DEBUG org.eclipse.jgit.transport.PacketLineOut - git> 0000
10:21:13.554 [main] WARN  org.eclipse.jgit.internal.transport.http.NetscapeCookieFile - Configured http.cookieFile '/<home>/.gitcookies' is missing
10:21:13.777 [main] DEBUG org.eclipse.jgit.transport.PacketLineIn - git< # service=git-upload-pack
10:21:13.777 [main] DEBUG org.eclipse.jgit.transport.PacketLineIn - git< 0000
10:21:13.781 [main] DEBUG org.eclipse.jgit.transport.PacketLineIn - git< version 2
10:21:13.781 [main] DEBUG org.eclipse.jgit.transport.PacketLineIn - git< agent=git/github-g20084f3c48a2
10:21:13.781 [main] DEBUG org.eclipse.jgit.transport.PacketLineIn - git< ls-refs
10:21:13.781 [main] DEBUG org.eclipse.jgit.transport.PacketLineIn - git< fetch=shallow filter
10:21:13.781 [main] DEBUG org.eclipse.jgit.transport.PacketLineIn - git< server-option
10:21:13.781 [main] DEBUG org.eclipse.jgit.transport.PacketLineIn - git< object-format=sha1
10:21:13.781 [main] DEBUG org.eclipse.jgit.transport.PacketLineIn - git< 0000
10:21:13.781 [main] DEBUG org.eclipse.jgit.transport.PacketLineOut - git> command=ls-refs
10:21:13.781 [main] DEBUG org.eclipse.jgit.transport.PacketLineOut - git> agent=JGit/5.13.0.202109080827-r
10:21:13.782 [main] DEBUG org.eclipse.jgit.transport.PacketLineOut - git> 0001
10:21:13.782 [main] DEBUG org.eclipse.jgit.transport.PacketLineOut - git> peel
10:21:13.782 [main] DEBUG org.eclipse.jgit.transport.PacketLineOut - git> symrefs
10:21:13.782 [main] DEBUG org.eclipse.jgit.transport.PacketLineOut - git> ref-prefix refs/tags/
10:21:13.782 [main] DEBUG org.eclipse.jgit.transport.PacketLineOut - git> 0000
10:21:13.949 [main] DEBUG org.eclipse.jgit.transport.PacketLineIn - git< 0000
10:21:13.950 [main] DEBUG org.eclipse.jgit.transport.PacketLineOut - git> 0000
10:21:13.966 [main] INFO  org.ossreviewtoolkit.downloader.vcs.Git - No Git revision for package 'proj2' and version 'main' found: IOException: No matching tag found for version 'main' among tags . Please create a tag whose name contains the version.
10:21:13.969 [main] DEBUG org.ossreviewtoolkit.downloader.Downloader - VCS download failed for 'SpdxDocumentFile::proj2:main': DownloadException: Unable to determine a revision to checkout.
Suppressed: IOException: No matching tag found for version 'main' among tags . Please create a tag whose name contains the version.
10:21:13.970 [main] PERFORMANCE org.ossreviewtoolkit.downloader.Downloader - Failed attempt to download source code for 'SpdxDocumentFile::proj2:main' from VcsInfo(type=Git, url=https://github.com/tardyp/proj2.git, revision=main, path=) took 1698ms.
10:21:13.974 [main] INFO  org.ossreviewtoolkit.downloader.Downloader - Trying to download source artifact for 'SpdxDocumentFile::proj2:main' from ...
10:21:13.975 [main] DEBUG org.ossreviewtoolkit.downloader.Downloader - Source artifact download failed for 'SpdxDocumentFile::proj2:main': DownloadException: No source artifact URL provided for 'SpdxDocumentFile::proj2:main'.
10:21:13.976 [main] PERFORMANCE org.ossreviewtoolkit.downloader.Downloader - Failed attempt to download source code for 'SpdxDocumentFile::proj2:main' from RemoteArtifact(url=, hash=Hash(value=, algorithm=)) took 1ms.
10:21:13.977 [main] ERROR org.ossreviewtoolkit.scanner.scanners.scancode.ScanCode - Could not download 'SpdxDocumentFile::proj2:main': DownloadException: Download failed for 'SpdxDocumentFile::proj2:main'.
Suppressed: DownloadException: Unable to determine a revision to checkout.
Suppressed: DownloadException: No source artifact URL provided for 'SpdxDocumentFile::proj2:main'.
sschuberth commented 3 years ago

The downloader still insist in getting a tag to start the scan as I understand in the following log:

Correct, we disallow scanning branches by default as doing so leads to non-deterministic scan results (because branches might move). I was pretty sure we'd already expose the downloader API's allowMovingRevisions (not to be confused with the analyzer's allowDynamicVersions) in the user-facing configuration, but looks like I'm mistaken. Let me see what I can do.

sschuberth commented 3 years ago

Let me see what I can do.

See https://github.com/oss-review-toolkit/ort/pull/4553.

tardyp commented 3 years ago

Thanks! we can go a little bit further...

The scans are going well on both projects, but after that, I got following exception.

22:45:16.488 [main] PERFORMANCE org.ossreviewtoolkit.model.OrtResult - Computing excluded projects...
Exception in thread "main" java.lang.IllegalArgumentException: The VcsInfo(type=Git, url=https://github.com/tardyp/proj1.git, revision=main, path=) of project 'SpdxDocumentFile::proj1:main' cannot be found in Repository(vcs=VcsInfo(type=, url=, revision=, path=), vcsProcessed=VcsInfo(type=, url=, revision=, path=), nestedRepositories={}, config=RepositoryConfiguration(excludes=Excludes(paths=[], scopes=[]), resolutions=Resolutions(issues=[], ruleViolations=[], vulnerabilities=[]), curations=Curations(packages=[], licenseFindings=[]), packageConfigurations=[], licenseChoices=LicenseChoices(repositoryLicenseChoices=[], packageLicenseChoices=[]))).
    at org.ossreviewtoolkit.model.OrtResult.getFilePathRelativeToAnalyzerRoot(OrtResult.kt:279)
    at org.ossreviewtoolkit.model.OrtResult.getDefinitionFilePathRelativeToAnalyzerRoot(OrtResult.kt:265)
    at org.ossreviewtoolkit.model.config.Excludes.findPathExcludes(Excludes.kt:52)
    at org.ossreviewtoolkit.model.OrtResult$projects$2.invoke(OrtResult.kt:116)
    at org.ossreviewtoolkit.model.OrtResult$projects$2.invoke(OrtResult.kt:109)
    at kotlin.SynchronizedLazyImpl.getValue(LazyJVM.kt:74)
    at org.ossreviewtoolkit.model.OrtResult.getProjects(OrtResult.kt:109)
    at org.ossreviewtoolkit.model.OrtResult.getProject(OrtResult.kt:355)
    at org.ossreviewtoolkit.scanner.ScannerKt.scanOrtResult(Scanner.kt:142)
    at org.ossreviewtoolkit.cli.commands.ScannerCommand.run(ScannerCommand.kt:233)
    at org.ossreviewtoolkit.cli.commands.ScannerCommand.run(ScannerCommand.kt:184)
    at com.github.ajalt.clikt.parsers.Parser.parse(Parser.kt:204)
    at com.github.ajalt.clikt.parsers.Parser.parse(Parser.kt:213)
    at com.github.ajalt.clikt.parsers.Parser.parse(Parser.kt:17)
    at com.github.ajalt.clikt.core.CliktCommand.parse(CliktCommand.kt:396)
    at com.github.ajalt.clikt.core.CliktCommand.parse$default(CliktCommand.kt:393)
    at com.github.ajalt.clikt.core.CliktCommand.main(CliktCommand.kt:411)
    at com.github.ajalt.clikt.core.CliktCommand.main(CliktCommand.kt:436)
    at org.ossreviewtoolkit.cli.OrtMainKt.main(OrtMain.kt:108)

I got the same results if I hack the analyzer-result.yml to have actual vcs info in main repository.

Also, the scan is actually caching the scan results even though the branch is moving. Probably we could cache the scans but only after finding out about the commit sha1 associated to the branch.

sschuberth commented 3 years ago

project 'SpdxDocumentFile::proj1:main' cannot be found in Repository

Ah, crap, that's a more tricky problem now. Not sure yet how to work around that.

Also, the scan is actually caching the scan results even though the branch is moving.

Yes, that has been a long-standing complaint from my side as well. @mnonnenmacher can you remind me how the new provenance-based scanner will behave here by default?

tardyp commented 2 years ago

Restarting on this conversation after #4563 I was wondering if we could rather put NOASSERTION (or whatever meaningful value for non existance) on downloadLocation for super project.

Based on this repro gist

we would change the spdx format to:

SPDXID: SPDXRef-DOCUMENT
creationInfo:
  created: '2020-07-23T18:30:22Z'
  creators:
  - oss-scanner
  licenseListVersion: '3.9'
dataLicense: Apache
documentDescribes:
- SPDXRef-Package-superproj
documentNamespace: http://spdx.org/spdxdocs/spdx-document-superproj
name: superproj
packages:
- SPDXID: SPDXRef-Package-superproj
  copyrightText: NOASSERTION
  description: superproj
  downloadLocation: NOASSERTION
  filesAnalyzed: false
  licenseConcluded: NOASSERTION
  licenseDeclared: NOASSERTION
  name: superproj
  versionInfo: NOASSERTION
- SPDXID: SPDXRef-Package-proj1
  copyrightText: Copyright (C) 2020 Example Inc.
  description: proj1
  downloadLocation: git+https://github.com/tardyp/proj1@main
  filesAnalyzed: false
  licenseConcluded: NOASSERTION
  licenseDeclared: NOASSERTION
  name: proj1
  versionInfo: main
- SPDXID: SPDXRef-Package-proj2
  copyrightText: Copyright (C) 2020 Example Inc.
  description: proj2
  downloadLocation: git+https://github.com/tardyp/proj2@main
  filesAnalyzed: false
  licenseConcluded: NOASSERTION
  licenseDeclared: NOASSERTION
  name: proj2
  versionInfo: main
relationships:
- relatedSpdxElement: SPDXRef-Package-superproj
  relationshipType: DEPENDENCY_OF
  spdxElementId: SPDXRef-Package-proj1
- relatedSpdxElement: SPDXRef-Package-superproj
  relationshipType: DEPENDENCY_OF
  spdxElementId: SPDXRef-Package-proj2
spdxVersion: SPDX-2.2

ort scans the two dependencies, and gives a scan error on superproject, which we can currently just ignore.

mnonnenmacher commented 2 years ago

Also, the scan is actually caching the scan results even though the branch is moving.

Yes, that has been a long-standing complaint from my side as well. @mnonnenmacher can you remind me how the new provenance-based scanner will behave here by default?

The scan results should always be stored with a resolved revision, so they will only be reused if the branch still points to the same revision. If you check the scan result in the storage you should see the the vcs_info.revision field might contain the branch name, but the resolved_revision field will have the SHA1 of the commit. If this is not the case, please file a separate bug report with steps to reproduce.

For the provenance-based scanner this behavior will stay the same, but it will bring massive performance improvements for git-repo projects, because it will not scan the whole source tree together but instead all Git repositories separately, this means that only those that have changed will have to be scanned again.

tardyp commented 2 years ago

hi @mnonnenmacher #4562 is the specific issue that is tracking this problem.

We can discuss it there, but even if the resolved_revision is stored there, it is not checked in case of moving branch, so the cache is reused while it should not

mnonnenmacher commented 2 years ago

@sschuberth I have to correct myself, this will indeed change with the provenance-based scanner, because resolution of revisions is implemented differently there, and it will always resolve revisions before accessing the storage. Therefore this issue will not be relevant there anymore.