nexB / scancode-toolkit

:mag: ScanCode detects licenses, copyrights, dependencies by "scanning code" ... to discover and inventory open source and third-party packages used in your code. Sponsored by NLnet project https://nlnet.nl/project/vulnerabilitydatabase, the Google Summer of Code, Azure credits, nexB and others generous sponsors!
https://github.com/nexB/scancode-toolkit/releases/
2.02k stars 533 forks source link

ignorable_copyrights are not ignored? #3813

Open bilbothebaggins opened 2 weeks ago

bilbothebaggins commented 2 weeks ago

I am scanning https://github.com/omniorb/omniorb/blob/main/LICENSE with

ScanCode version: 32.1.0
ScanCode Output Format version: 3.1.0
SPDX License list version: 3.23

scancode.exe -lci --license-text --ignore-binaries --only-findings -n 8 --json-pp

and the result contains:

"spdx_license_expression": "GPL-2.0-only"
...
              "rule_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/gpl-2.0_872.RULE",
...
      "copyrights": [
        {
          "copyright": "Copyright (c) 1989, 1991 Free Software Foundation, Inc., <http://fsf.org/>",
          "start_line": 4,
          "end_line": 4
        },
        {
          "copyright": "copyrighted by the Free Software Foundation",
          "start_line": 252,
          "end_line": 253
        }

But the rule above states:

---
license_expression: gpl-2.0
is_license_text: yes
relevance: 100
ignorable_copyrights:
    - Copyright (c) 1989, 1991 Free Software Foundation, Inc., <http://fsf.org/>
    - copyrighted by the Free Software Foundation
...

What is the point of the ignorable_copyrights if they are still listed in the result with exactly this rule?

Am I misunderstanding sth about this attribute, or is this a bug?

Cheers.

pombredanne commented 2 weeks ago

ignorable_copyrights are only ignored when using the command line --filter-clues option.

For instance: without:


headers:
    -   tool_name: scancode-toolkit
        tool_version: v32.1.0-31-g1f94c9d103
        options:
            input:
                - LICENSE
            --copyright: yes
            --license: yes
            --yaml: '-'
.....
files:
    -   path: LICENSE
        type: file
        detected_license_expression: gpl-2.0
        detected_license_expression_spdx: GPL-2.0-only
........
        license_clues: []
        percentage_of_license_text: '100.0'
        copyrights:
            -   copyright: Copyright (c) 1989, 1991 Free Software Foundation, Inc., <http://fsf.org/>
                start_line: 4
                end_line: 4
            -   copyright: copyrighted by the Free Software Foundation
                start_line: 252
                end_line: 253
        holders:
            -   holder: Free Software Foundation, Inc.
                start_line: 4
                end_line: 4
            -   holder: the Free Software Foundation
                start_line: 252
                end_line: 253
        authors: []
        scan_errors: []

and with --filter-clues

headers:
    -   tool_name: scancode-toolkit
        tool_version: v32.1.0-31-g1f94c9d103
        options:
            input:
                - LICENSE
            --copyright: yes
            --filter-clues: yes
            --license: yes
            --yaml: '-'

...
files:
    -   path: LICENSE
        type: file
        detected_license_expression: gpl-2.0
        detected_license_expression_spdx: GPL-2.0-only
...
        license_clues: []
        percentage_of_license_text: '100.0'
        copyrights: []
        holders: []
        authors: []
        scan_errors: []
bilbothebaggins commented 2 weeks ago

The option --filter-clues doe not work for me, sorry:

I run:

scancode -lci --license-text --filter-clues --ignore-binaries --only-findings -n 8 --json-pp test.json --include "*.txt" .\test_dir

and the option is also shown in the result file:

        "--filter-clues": true,

But the result is exactly the same as without this option - that is the copyright is included.

Can you share the exact command line you used to scan the file? Thanks.