oss-review-toolkit / ort

A suite of tools to automate software compliance checks.
https://oss-review-toolkit.org
Apache License 2.0
1.6k stars 309 forks source link

Trouble running the analyzer / scanner on a directory not under Version Control #2896

Open OctagonHex opened 4 years ago

OctagonHex commented 4 years ago

Hello, I sucessfully created a scan result with the following command. (I used the scancode-toolkit examples.) cli\build\install\ort\bin\ort scan -p "C:\scancode-toolkit-3.1.1\samples" -o myOut Output:

Using scanner 'ScanCode' with storage 'FileBasedStorage with XZCompressedLocalFileStorage backend'.
Local file storage has 0 scan results files.
Writing scan result to 'myOut\scan-result.yml'.

If I look at the .yml file, it looks good and contains many licensed.

Now I try to generate any kind of report. My goal is to generate an attribution notice. So I run the command: cli\build\install\ort\bin\ort report -i myOut\scan-result.yml -o myOutReport -f Excel but the output shows the error

Creating the 'Excel' report...
15:15:23.583 [main] ERROR org.ossreviewtoolkit.commands.ReporterCommand - Could not create 'Excel' report: IllegalArgumentException: The provided ORT result does not contain an analyzer result.
Failed to create any report.

For -f NoticeSummary, or -f NoticeByPackage OSS-RT seems to work at first glance:

Creating the 'NoticeSummary' report...
Successfully created the 'NoticeSummary' report at [myOutReport\NOTICE_SUMMARY] in 0.012422699s.
Successfully created 1 of 1 report(s).

But despite the many licenses in the .yml, the resulting report is empty, i.e. it says: This project neither contains or depends on any third-party software components.

What is the problem, or how can this be fixed?

I attached my scan result for easy reference. scan-result.yml.txt

sschuberth commented 4 years ago

While running a reporter on an ORT result with only a scan result is not forbidden, this is a use-case that is not well tested. The usual (and well tested) workflow is to first create an ORT result with an analyzer result, and then use that as the input for the scanner, which creates another ORT result file that combines the analyzer and scan results. Such "rich" ORT result files should work fine to create reports.

OctagonHex commented 4 years ago

Maybe you can give me a hint on how to accomplish my goal. For example: I try to anaylze a unstructured directory of source code. I'll use the samples from ScanCode-toolkit. I first analyze them, and the analyzer runs OK. As expected, the result is very short and does not contain any dependencies. Now, the problem is, that If I use this as input, the scanner does not even scan the directory! The output from the anaylzer does not even cotain the source directory. The scanner result now mostly contains "No source artifact URL provided for 'Unmanaged::ScanCode-Samples:'." I also tried to add the project to a local GIT repository (without a remote master), so now the warning for "non-cacheable results" is gone, but the scanner still can't find the source code.

What parameters need to be set, so the analyzer will save where the source code was, so that the scanner can find it?

C:\oss-review-toolkit>cli\build\install\ort\bin\ort --info analyze -f JSON -i "C:\temp\ScanCode-Samples" -o analyzerOut
________ _____________________
\_____  \\______   \__    ___/ the OSS Review Toolkit, version 0.1.0-SNAPSHOT.
 /   |   \|       _/ |    |    Running 'analyze' under Java 14.0.1 on Windows 10 with
/    |    \    |   \ |    |    ORT_DATA_DIR = C:\Users\USER\.ort
\_______  /____|_  / |____|    OS = Windows_NT
        \/       \/
More environment variables:
COMSPEC = C:\WINDOWS\system32\cmd.exe
JAVA_HOME = C:\jdk-14.0.1+7

The following package managers are activated:
        Bower, Bundler, Cargo, Conan, DotNet, GoDep, GoMod, Gradle, Maven, NPM, NuGet, PhpComposer, PIP, Pipenv, Pub, SBT, Stack, Yarn
Analyzing project path:
        C:\temp\ScanCode-Samples
08:16:19.253 [main] INFO  org.ossreviewtoolkit.analyzer.Analyzer - Unmanaged projects found in:
08:16:19.255 [main] INFO  org.ossreviewtoolkit.analyzer.Analyzer -      .
08:16:19.298 [Analyzer-1] INFO  org.ossreviewtoolkit.analyzer.PackageManager - Resolving Unmanaged dependencies for 'C:\temp\ScanCode-Samples'...
08:16:19.358 [Analyzer-1] INFO  org.ossreviewtoolkit.utils.OrtAuthenticator - Authenticator is already installed.
08:16:19.359 [Analyzer-1] INFO  org.ossreviewtoolkit.utils.OrtProxySelector - Proxy selector is already installed.
08:16:19.490 [Analyzer-1] INFO  org.ossreviewtoolkit.utils.OrtAuthenticator - Authenticator is already installed.
08:16:19.491 [Analyzer-1] INFO  org.ossreviewtoolkit.utils.OrtProxySelector - Proxy selector is already installed.
08:16:20.440 [Analyzer-1] WARN  org.ossreviewtoolkit.analyzer.managers.Unmanaged - Analysis of local directory 'C:\temp\ScanCode-Samples' which is not under version control will produce non-cacheable results as no version for the cache key can be determined.
08:16:20.445 [Analyzer-1] INFO  org.ossreviewtoolkit.analyzer.PackageManager - Resolving Unmanaged dependencies for 'ScanCode-Samples' took 1.1431624s.
Found 1 project(s) in total.
Writing analyzer result to 'analyzerOut\analyzer-result.json'.
C:\oss-review-toolkit>cli\build\install\ort\bin\ort --info scan -i analyzerOut\analyzer-result.json -o myOut
________ _____________________
\_____  \\______   \__    ___/ the OSS Review Toolkit, version 0.1.0-SNAPSHOT.
 /   |   \|       _/ |    |    Running 'scan' under Java 14.0.1 on Windows 10 with
/    |    \    |   \ |    |    ORT_DATA_DIR = C:\Users\USER\.ort
\_______  /____|_  / |____|    OS = Windows_NT
        \/       \/
More environment variables:
COMSPEC = C:\WINDOWS\system32\cmd.exe
JAVA_HOME = C:\jdk-14.0.1+7

Using scanner 'ScanCode' with storage 'FileBasedStorage with XZCompressedLocalFileStorage backend'.
Local file storage has 0 scan results files.
08:21:00.843 [main] INFO  org.ossreviewtoolkit.scanner.LocalScanner - Bootstrapping scanner 'ScanCode' as required version 3.0.2 was not found in PATH.
08:21:00.846 [main] INFO  org.ossreviewtoolkit.scanner.scanners.ScanCode - Downloading ScanCode from https://github.com/nexB/scancode-toolkit/archive/v3.0.2.zip...
08:21:02.056 [main] INFO  org.ossreviewtoolkit.scanner.scanners.ScanCode - Retrieved ScanCode from local cache.
08:21:02.497 [main] INFO  org.ossreviewtoolkit.scanner.scanners.ScanCode - Unpacking 'C:\Users\USER\AppData\Local\Temp\ort9967510014256878867ScanCode-v3.0.2.zip' to 'C:\Users\USER\AppData\Local\Temp\ort15900786484210730018ScanCode-3.0.2'...
08:21:49.381 [main] INFO  org.ossreviewtoolkit.utils.ProcessCapture - Running 'C:\Users\USER\AppData\Local\Temp\ort15900786484210730018ScanCode-3.0.2\scancode-toolkit-3.0.2\scancode.bat --version' in 'C:\Users\USER\AppData\Local\Temp\ort15900786484210730018ScanCode-3.0.2\scancode-toolkit-3.0.2'...
08:22:47.472 [main] INFO  org.ossreviewtoolkit.utils.ProcessCapture - Running 'C:\Users\USER\AppData\Local\Temp\ort15900786484210730018ScanCode-3.0.2\scancode-toolkit-3.0.2\scancode.bat --version' in 'C:\Users\USER\AppData\Local\Temp\ort15900786484210730018ScanCode-3.0.2\scancode-toolkit-3.0.2'...
08:22:49.353 [FileBasedStorage with XZCompressedLocalFileStorage backend-1] INFO  kotlinx.coroutines.CoroutineScope - Looking for stored scan results for Unmanaged::ScanCode-Samples: and ScannerDetails(name=ScanCode, version=3.0.2, configuration=--copyright --license --ignore *.ort.yml --info --strip-root --timeout 300 --ignore HERE_NOTICE --ignore META-INF/DEPENDENCIES --json-pp) (1/1).
08:22:49.370 [ScanCode-1] INFO  kotlinx.coroutines.CoroutineScope - No stored result found for Unmanaged::ScanCode-Samples: and ScannerDetails(name=ScanCode, version=3.0.2, configuration=--copyright --license --ignore *.ort.yml --info --strip-root --timeout 300 --ignore HERE_NOTICE --ignore META-INF/DEPENDENCIES --json-pp), scanning package in thread 'ScanCode-1' (1/1).
08:22:49.373 [ScanCode-1] INFO  org.ossreviewtoolkit.downloader.Downloader - Trying to download source code for 'Unmanaged::ScanCode-Samples:'.
08:22:49.377 [ScanCode-1] INFO  org.ossreviewtoolkit.downloader.Downloader - Trying to download 'Unmanaged::ScanCode-Samples:' sources to 'C:\oss-review-toolkit\myOut\downloads\Unmanaged\unknown\ScanCode-Samples\unknown' from VCS...
08:22:49.380 [ScanCode-1] INFO  org.ossreviewtoolkit.downloader.Downloader - Trying to download source artifact for 'Unmanaged::ScanCode-Samples:' from ...
08:22:49.384 [ScanCode-1] ERROR org.ossreviewtoolkit.scanner.LocalScanner - Could not download 'Unmanaged::ScanCode-Samples:': DownloadException: Download failed for 'Unmanaged::ScanCode-Samples:'.
Suppressed: DownloadException: No VCS URL provided for 'Unmanaged::ScanCode-Samples:'.,
Suppressed: DownloadException: No source artifact URL provided for 'Unmanaged::ScanCode-Samples:'.
08:22:49.385 [ScanCode-1] INFO  kotlinx.coroutines.CoroutineScope - Finished scanning Unmanaged::ScanCode-Samples: in thread 'ScanCode-1' (1/1).
08:22:49.388 [main] INFO  org.ossreviewtoolkit.model.OrtResult - Computing excluded projects which may take a while...
08:22:49.390 [main] INFO  org.ossreviewtoolkit.model.OrtResult - Computing excluded projects done.
Writing scan result to 'myOut\scan-result.yml'.
sschuberth commented 6 months ago

I'm trying to sum up the current status here: An OrtResult contains a Repository that in turn contains a VcsInfo. The latter cannot be set to anything meaningful if the analyzed directory is not under version control.

Instead of doing something hacky like setting it to VcsInfo.EMPTY, an idea is to replace the current Repository with something like a new AnalyzerInput class with a Provenance instead of strictly VCS-related classes. Maybe also NestedProvenance could be generalized a bit so AnalyzerInput could use it to also substitute Repository's nestedRepositories. When a directory that is not under version control is analyzed, the provenance would be set to UnknownProvenance.

In that context maybe also RepositoryConfiguration couold be renamed to something more general like ProductConfiguration or so.

heliocastro commented 6 months ago

Easy reproducible (need have git) as simulate a fake monorepo:

mkdir test
cd test 
git clone https://github.com/apple/swift-nio.git
git clone https://github.com/sw360/sw360python.git
ort analyze -i . -o output
pepper-jk commented 6 months ago

I'm having a look at this refactoring. Let me know if you have any more input.

pepper-jk commented 5 months ago

[...] Maybe also NestedProvenance could be generalized a bit so AnalyzerInput could use it to also substitute Repository's nestedRepositories. [...]

@sschuberth I noticed that NestedProvenance is located inside org.ossreviewtoolkit.scanner.provenance rather than org.ossreviewtoolkit.model. From what I understand about the code so far however, most data structures, such as Provenance and Repository are located inside the model.

I'm I correct in assuming that NestedProvenance was only defined in the scanner, since it was only utilized there up until now and that it would generally make sense to move it into the model? In the case of AnalyzerInput, which should probably also be located in model, it seems to cause a circular dependency between model and scanner, if we were to import NestedProvenance inside the AnalyzerInput.

Could moving the NestedProvenance to model be a good first step (pull request) in preparation for the AnalyzerInput? Or am I missing something here?

sschuberth commented 5 months ago

I'm I correct in assuming that NestedProvenance was only defined in the scanner, since it was only utilized there up until now and that it would generally make sense to move it into the model?

Maybe not "generally", but in the context of this refactoring, yes, if we agree that this refactoring makes sense. I'd esp. like to hear @mnonnenmacher's opinion here.

Could moving the NestedProvenance to model be a good first step (pull request) in preparation for the AnalyzerInput?

See above. I'd like to first have a consensus among the core devs that this refactoring is the way to go.

pepper-jk commented 5 months ago

@mnonnenmacher for an overview of changes, I opened a pull request https://github.com/oss-review-toolkit/ort/pull/8724.

pepper-jk commented 5 months ago

During today's ORT community meeting, we discussed possible solutions for allowing non-vcs projects to be analyzed and scanned.

Our use case at HELLA would be to scan non-vcs projects, not just analyze them. This distinction had not been mentioned explicitly up until now.

In the light of that use case, @fviernau and @sschuberth advised to abandon the previously suggested course, of allowing UnknownProvance as an input for the analyzer. Instead they put forward a new refactoring approach:

  1. Replace Repository's VcsInfo (and related variables) with KnownProvenance, making it less dependent on VcsInfo.
  2. Add a LocalProvenance as a new data class for KnownProvenance, which contains a local directory path.

This would allow the analyzer and scanner to handle non-vcs projects as a Provenance as long as both steps are done on the same machine with the same directory structure.

PR #8724 will be dropped in favor of this new approach. I will post any updates or findings here.

Further input and discussion on this topic is welcome.