oss-review-toolkit / ort

A suite of tools to automate software compliance checks.
https://oss-review-toolkit.org
Apache License 2.0
1.59k stars 309 forks source link

Allow users to specify which results can be excluded from report license findings #731

Closed tsteenbe closed 5 years ago

tsteenbe commented 6 years ago

A common use case is that not everything that is automatically detected is actually used, included in the build artifact or correct therefore we would like to provide our users with an easy to specify what findings to exclude from inclusion into the SPDX output.

Proposal: Allow ORT users to specify via .ort.yml file which detected projects, packages, scopes, errors or licenses are included in the distributed/delivered artifact (after building the project).

tsteenbe commented 6 years ago

Below the proposal for the format used in an .ort.yml to mark specific projects, packages, scopes or licenses as excluded from the distributed artifact. Example is based on GRPC v.1.14.1

Welcome your feedback

exclude:
  projects:
  - path: "ruby/Gemfile"
    # Exclude a package within above defined path
    packages:
    - id: "Bundler::third_party/protobuf/ruby/Gemfile:"
      reason: "OPTIONAL_COMPONENT_OF"
      comment: "Only using the C++ and Java code of GRPC"
  - path: "java/util/pom.xml"
    # Exclude scope 'test' within above defined path 
    scopes:
    - name: "test"
      reason: "TEST_TOOL_OF"
      comment: "Only used for testing GRPC"
  - path: python/compatibility_tests/v2.5.0/setup.py
    # Exclude errors within path defined above
    errors:
      - message: "IllegalStateException: projectVersion must not be null"
        reason: BUILD_TOOL_ISSUE
        comment: "Python issues which need to be fixed in GRPC"
  # Examples on how to exclude entire projects from tree
  - path: "Gemfile"
    reason: "OPTIONAL_COMPONENT_OF"
    comment: "Only using the C++ and Java code of GRPC"
  - path: "examples/android/helloworld/build.gradle"
    reason: "EXAMPLE_OF"
    comment: "Java examples of GRPC which only have been used to learn how to use GRPC"
  - path: "src/android/test/interop/app/build.gradle"
    reason: "TEST_TOOL_OF"
    comment: "App used to test GRPC"

# Globally exclude packages below
packages:
- id: "Maven:junit:junit:4.4" 
  reason: "TEST_TOOL_OF"
  comment: "JUnit is a unit testing framework for the Java programming language"
# Globally exclude scopes below
scopes:
- name: "test"
  reason: "TEST_TOOL_OF"
  comment: "Standard Maven scope for dependency only needed for test compilation and execution"
# Globally exclude errors below
errors:
- message: "ERROR: Timeout after 300 seconds while scanning file 'samples/SupportLeanbackJank/src/main/res/raw/bbb_sunflower_2160p_60fps.mp4'."
  reason: SCANNER_ISSUE
  comment: "ScanCode is know to timeout on large file such as the video file"

# Possible values for the reason field are listed below. Values are based on SPDX relationship https://spdx.github.io/spdx-spec/7-relationships-between-SPDX-elements/ as SPDX will be ORT's format for data exchange

reason_exclude_project:
  - BUILD_TOOL_OF          # project only contains tools used for building source code which are not included in distributed build artifacts
  - DATA_FILE_OF           # project only contains data files such as fonts or images which are not included in distributed build artifacts
  - DOCUMENTATION_OF       # project only contains documentation which is not distributed in build artifacts
  - EXAMPLE_OF             # project only contains source code examples which are not included in distributed build artifacts
  - OPTIONAL_COMPONENT_OF  # project only contains optional components for the code that is built and not included in distributed build artifacts
  - TEST_TOOL_OF           # project only contains tools used for testing source code which are not included in distributed build artifacts

reason_exclude_package:
  - BUILD_TOOL_OF          # package only contains tools used for building source code which are not included in distributed build artifacts
  - DATA_FILE_OF           # package only contains data files such as fonts or images which are not included in distributed build artifacts
  - DOCUMENTATION_OF       # package only contains documentation which is not distributed in build artifacts
  - EXAMPLE_OF             # package only contains source code examples which are not included in distributed build artifacts
  - OPTIONAL_COMPONENT_OF  # package is an optional component in code that is built and not included in distributed build artifacts
  - TEST_TOOL_OF           # package only contains tools used for testing source code which are not included in distributed build artifacts

reason_exclude_scope:
  - BUILD_TOOL_OF          # scope only contains packages used for building source code which are not included in distributed build artifacts
  - PROVIDED_BY            # scope only contains packages that have to be provided by user of distributed build artifacts
  - TEST_CASE_OF           # scope only contains packages used for testing source code which are not included in distributed build artifacts

reason_exclude_error:
  - BUILD_TOOL_ISSUE       # error is due to way a package is built
  - CANT_FIX_ISSUE         # error can not be fixed as it requires a fix to be made by 3rd party
  - SCANNER_ISSUE          # error is due to scanner issue such as time out on a large file

reason_exclude_license:
  - BUILD_TOOL_OF          # license applies to code that is used for building source code which are not included in distributed build artifacts
  - DOCUMENTATION_OF       # license applies to documentation which is not distributed in build artifacts
  - EXAMPLE_OF             # licenses applies only source code examples which are not included in distributed build artifacts
  - FALSE_POSITIVE         # license match is a false positive match by the scanner
  - OPTIONAL_COMPONENT_OF  # license applies to code that an optional component in code that is built and not included in distributed build artifacts
  - TEST_TOOL_OF           # license applies to code that is used for testing source code which are not included in distributed build artifacts
pombredanne commented 6 years ago

@tsteenbe Interesting approach ... I like using reason codes :+1:

I wonder if the exclusion of any of projects, packages, scopes or licenses is not giving too much flexibility and sophistication? Would a path-only approach be simpler to handle both in the data and in the code?

And how would you reconcile this with the simpler CD facet approach?

kestewart commented 6 years ago

@tsteenbe - I like the reason codes as well.

Just curious, did you considered also adding in a way to exclude of file types (https://spdx.github.io/spdx-spec/4-file-information/#43-file-type)? I can think of some use cases it might be handy not to inadvertently redistribute something (pictures used for creating backgrounds where the license rights aren't known, etc.)

tsteenbe commented 6 years ago

@pombredanne We choose the projects, packages, scopes or licenses approach as

...And how would you reconcile this with the simpler CD facet approach? For me I see a easy translation to ClearyDefined facet approach only for me core and possiblly data needs some further thoughts

@kestewart I actually spend some time other types of filters include one on type such as SPDX file types. We choose this approach as it will allow us to offer our users, who are mostly developers, a powerful yet simple and well understood method to filtering.

sschuberth commented 5 years ago

@mnonnenmacher, could you please comment on the current state of this, what's still missing (if so), and / or whether we can close this?

mnonnenmacher commented 5 years ago

Exlusion of projects and scopes is implemented. Errors and license excludes were instead implemented as resolutions. See the documentation here: https://github.com/heremaps/oss-review-toolkit/blob/master/docs/Configuration.md Only package exludes are missing, if required they should be covered in a separate ticket.