oss-review-toolkit / ort

A suite of tools to automate software compliance checks.
https://oss-review-toolkit.org
Apache License 2.0
1.6k stars 309 forks source link

In SPDX reports, include licenseInfoFromFiles and file-level information for the scanned project itself as well #8485

Open daniel-kr opened 6 months ago

daniel-kr commented 6 months ago

An ORT scan is applied on downloaded source code of external dependencies and on the scanned project itself. The latter is necessary to also cover OSS code that has been copied to the code base of a project. So far, so good. šŸ‘

In an SPDX report, the information of the project itself is converted to an SPDX package entity as it is done for external dependencies as well. However, this project entity does neither contain the attribute licenseInfoFromFiles nor does it contain file-level information even though the scanner found (and e.g. the web app report contains) licenses. Only detected copyright statements are included in the field copyrightText.

I suggest to include licenseInfoFromFiles and, if the property file.information.enabled is set, also file-level information for the scanned project as well. Looking at the code, this should not be too difficult to achieve.

I don't know if this is a bug or a missing feature. In any case, I would volunteer to implement this change. But before I start I would like to know if you consider this a good idea and if my PR has a chance to be merged.

sschuberth commented 6 months ago

I don't know if this is a bug or a missing feature.

Maybe @fviernau could comment on that?

In any case, I would volunteer to implement this change.

That would be highly appreciated, than you for the willingness to help!

daniel-kr commented 6 months ago

Do you consider this to be a sensible change?

sschuberth commented 6 months ago

Personally, I believe it makes sense to treat projects and packages consistently here, yes.

Note though that SPDX does not explicitly distinguish between what ORT calls projects and packages, but just has (SPDX) packages and relations between them.

fviernau commented 6 months ago

I don't know if this is a bug or a missing feature. In any case, I would volunteer to implement this change.

What I recall (long ago, I'm not certain anymore if that's correct) is the following:

  1. To fulfill the requirement at that time it was sufficient to implement for packages, so it saved some time to not do it.
  2. The value of adding projects at least to me personally was questionable:
    • For the project's, there may be different and maybe more fine grained requirements what information should / should not be exposed.
    • If the project is closed source, parts of the information e.g. file paths, submodule structure can be meaningless as these are just links to the source which cannot be looked up.
    • exposing multiple projects:
      • submodule strucure often is fine grained, and can be considered implementation detail
      • for proprietary software, I can imagine where one does not want to expose that structure
      • can lead to duplication, e.g. file findings
  3. There is a way to turn a projects directory into a dependency entry in the NOTICE_BY_PACKAGE report. (Nice to have if this worked also for SPDX reports, not sure if it does)

Given that I believe (without re-thinking it again more deeply) that if something for projects was implemented,

  1. There should be a toggle for enabling / disabling it
  2. I'd tend to only report a single (merged) project, instead of the submodule structure.

@tsteenbe do you maybe memorize further things, or have thoughts on this?

daniel-kr commented 6 months ago

Thank you for the outline.

Making it configurable would be fine for me. However, I wonder what exactly should be configured? Currently, the project entries contain copyright statements found by the scanner but they do not contain license statements found by it. This is inconsistent IMO, isn't it? Other report formats like the PDF report contain both for the project. So I tend to all or nothing in that regard. The new configuration option could control if project entities are created at all. On top of that, there could be another option controlling whether file-level details are provided for project entities. I.e. the options not contain project, contain project summary, contain project with file-level details.

Having just one merged entry for the whole root project would be sufficient for me although it would be a bit more difficult to implement and questions would arise like what to put into the attribute versionInfo for the merged entry.