pilosus / pip-license-checker

Check license types for third-party dependencies: permissive, copyleft, proprietory, etc.
https://blog.pilosus.org/posts/2021/09/07/pip-license-checker/
Other
67 stars 3 forks source link

Fallback to the GitHub API to detect a Python dep's license name should be visible to a user #89

Closed pilosus closed 1 year ago

pilosus commented 3 years ago

For now, we are trying to detect Python dep's license name this way:

  1. Metadata's trove classifier (trove classifiers are recommended for OSI-approved FLOSS licenses)
  2. Metadata's license field (recommended for licenses not available for trove classifiers, e.g. FLOSS license with exceptions or EULA)
  3. GitHub repo license

The problem with the GitHub API response for license name is that it is not version-specific, but rather HEAD-specific. If we want to detect a license name for package:0.1.2, but the HEAD is pointing to the package:1.0.0 we can easily end up with the wrong verdict if the package has changed its license since the version 0.1.2.

What to do?

  1. Try to implement more sophisticated heuristics (e.g. check out the code to version branch/tag, both v0.1.2 and 0.1.2, try to parse LICENSE or COPYING)
  2. Use the GitHub API as we do now, add an additional column to the report:
| Package           | License Name                               | License ID                | License Type   | License Source        |
| package1:0.1.2    | Apache 2.0 License                         | Apache-2.0                | Permissive     | External              |
| package2:3.141592 | GNU General Public License v2 or any later | GPL-2.0-or-later          | StrongCopyleft | External              |
| package3:21.09    | Other/Proprietary License (EULA)           | NA                        | Other          | PythonMetaClassifiers |
| package4          | GPL-3.0 Linking Exception                  | GPL-3.0-linking-exception | WeakCopyleft   | PythonMetaLicense     |
| package5:2.19.2   | null                                       | NA                        | Error          | PythonGitHub          |
  1. Introduce a flag option --fail-license-source SOURCE_NAME, so that a user who needs stricter checks may always get notified if the GitHub API fallback with its known disadvantages is triggered.

Step 1 is arguably laborious to implement, error-prone (dep's version may not necessarily be matching the branch name or a tag), may require adding GitHub API token support (the API has a rate limit of 60 RPS, multiple requests to the API may easily lead to 429 status code for exceeding the limits, especially for checks with longish lists of deps).

I'd go with steps 3 and 3 and not implementing step 1.