nexB / scancode-toolkit

:mag: ScanCode detects licenses, copyrights, dependencies by "scanning code" ... to discover and inventory open source and third-party packages used in your code. Sponsored by NLnet project https://nlnet.nl/project/vulnerabilitydatabase, the Google Summer of Code, Azure credits, nexB and others generous sponsors!
https://github.com/nexB/scancode-toolkit/releases/
2.02k stars 533 forks source link

Copyright in LICENSE file not detected (email issue?) #3764

Open vw-anton opened 2 months ago

vw-anton commented 2 months ago

Description

Copyright in the following file is not detected by ScanCode: https://github.com/videojs/vhs-utils/blob/main/LICENSE

Detected:

{
  "path": "vhs-utils-main/LICENSE",
  "type": "file",
  "name": "LICENSE",
  "status": "application-package",
  "tag": "",
  "extension": "",
  "size": 1078,
  "md5": "b5e2dbf622c44f93baf779123b4c1cc7",
  "sha1": "cb191af7ec58c84aae40b49d4be2646239ab087d",
  "sha256": "83c604241478b9801530198a4dd46fca5fc422015d5a01eeacf96162401ab31a",
  "sha512": "",
  "mime_type": "text/plain",
  "file_type": "ASCII text",
  "programming_language": "",
  "is_binary": false,
  "is_text": true,
  "is_archive": false,
  "is_media": false,
  "is_key_file": true,
  "detected_license_expression": "mit",
  "detected_license_expression_spdx": "MIT",
  "license_detections": [
 ...
  ],
  "license_clues": [],
  "percentage_of_license_text": 96.41,
  "compliance_alert": "",
  "copyrights": [],
  "holders": [],
  "authors": [],
  "package_data": [],
  "for_packages": [
    "pkg:npm/%40videojs/vhs-utils@4.1.0?uuid=7803ef84-762b-4555-b5cf-0935ad59f1f5"
  ],
  "emails": [
    {
      "email": "brandonocasey@gmail.com",
      "end_line": 1,
      "start_line": 1
    }
  ],
  "urls": [],
  "extra_data": {}
}

Expected: brandonocasey <brandonocasey@gmail.com> or at least brandonocasey is returned as copyright

How To Reproduce

scnacode.io config below

System configuration

  "tool_name": "scanpipe",
  "tool_version": "34.2.0",
  "other_tools": [
    "pkg:pypi/scancode-toolkit@32.1.0"
pombredanne commented 2 months ago

@vw-anton Thanks for the report! See the fix in https://github.com/nexB/scancode-toolkit/commit/ab6699fb57508605963c1c87637eacd68dd65278 ... the issue was not about the email, but rather about the lack of year in this copyright form using a (somewhat uncommon) all lower case made up name.