oss-review-toolkit / ort

A suite of tools to automate software compliance checks.
https://oss-review-toolkit.org
Apache License 2.0
1.57k stars 308 forks source link

Reasons to run ScanCode additionally to ORT to generate OpossumUI input #4577

Open maxhbr opened 2 years ago

maxhbr commented 2 years ago

As asked for in the dev call:

Here I want to list some reasons, why the compliance information provider agnostic tool OpossumUI can be used without ORT and with ORT + ScanCode instead of relying on ORT to run ScanCode.

The OpossumUI is build to be able to consume generic compliance information. ORT is one provider of such, but there is also ScanCode, Dependency-Check, SCANOSS... The fact that ORT itself can run ScanCode, and provides a selected part of the ScanCode result (copyright and license findings in a simpler format), is no sufficient reason to not also support ScanCode directly and to leverage the full potential of it.

The reasons:

All of these points are based on my personal knowledge of the ORT result format. I do not see them as feature requests or bug reports. I am not even sure if everything mentioned here should be supported by ORT or whether they are orthogonal use cases. For reference in the potential discussion they are numbered.

(R1) Currently ORT uses an old pre-release of ScanCode

Within ORT the used ScanCode version is a pre-release that is over 1 year old. Having the possibility to run ScanCode individually gives me the results from the current version (2260 commits difference). There are probably other issues mentioning that and there are also PRs:

(R2) For scanning the root project, ORT relies on version control

To scan a project that is not under version control (e.g. provided as an archive or the content of Docker layers) or if one wants to scan a subset of a (mono-) repository, that can not easily be done with ORT. By directly scanning the currently available source code, it can be done but still requires some hacks.

An idea might be, that the analyze step could already run the scan of the root project.

(R3) ScanCode has the --package option that labels binaries like DLLs accurately, and would find definition files in dependencies

Currently the scanner does not provide the package information from ScanCode. E.g.

This is also helpful, since ScanCode might support tools and ecosystems, that are not yet fully integrated in ORT.

(R4) The ORT result just lists files with findings

For the UI it is helpful to show all files, not just the ones with actual findings. One can extract this information from the ScanCode result.

(R5) The ScanCode report contains valuable information to understand the quality of the actual finding

E.g.

From that information one can deduce a "confidence" value that can be displayed in the UI.

(R6) ScanCode can provide the actual license text that was matched (with the --license-text option)

With that option one can extract the actual matched license text and show it in the UI / use it in the notice generation. This can be especially helpful, if it just matched other-permissive or something similar.

This is currently not enabled in https://github.com/oss-review-toolkit/ort/blob/4b79fbd17a783b6456598b62a224bd7ef9d9523b/scanner/src/main/kotlin/scanners/scancode/ScanCode.kt#L81-L111

(R7) Sometimes ScanCode is good enough to get an understanding of a code base

In some cases, where no package management is expected, ScanCode is often good enough. So having the possibility to also generate the Opossum input without ORT adds flexibility. Especially if NPM is involved and there is a deadline ;).

I know that there is now the possibility to run different scanners on the root project and on the dependencies.

(R8) The ORT call of ScanCode does not include extractcode to recursively extract files before scanning

Sometimes the source code contains archives, that are not transparent for ScanCode. For that ScanCode provides extractcode, but this is not applied right now.

(R9) for future improvements: parts that are not yet utilized

The ScanCode result also contains the following additional information that might be helpful in the future:

sschuberth commented 1 year ago

Thanks a lot for this thorough write-up, @maxhbr! I'll comment over time as ORT evolves.

sschuberth commented 1 year ago

To start with:

(R1) Currently ORT uses an old pre-release of ScanCode

I believe this has been resolved as the version of ScanCode to use is configurable now, and we recently added support for output format 3 / ScanCode 32.0.0 and up.