licensee / licensee

A Ruby Gem to detect under what license a project is distributed.
https://licensee.github.io/licensee/
MIT License
800 stars 267 forks source link

support matching `http|https` for links in license text (GPL-3.0) #632

Closed NoNameForMee closed 1 year ago

NoNameForMee commented 1 year ago

Describe the bug

If having a repo with latest official GPL licenses, which mentions links to gnu.org and fsf.org with explicit https:// it appears to me that licensee does not match the official license texts (specifically I have only tried with GPL-3.0).

Steps to reproduce the behavior

  1. Download latest official GPL version 3 license text from either https://choosealicense.com/licenses/gpl-3.0/ or https://www.gnu.org/licenses/gpl-3.0.txt and place it in you repo as a file LICENSE.
  2. run licensee on the repo.
  3. no exact match...

Expected behavior

licensee should match the official license text regardless of it is using the current explicit https:// or previous text reading http:// for the links.

Additional context

Possibly this can serve as inspiration: https://github.com/spdx/license-list-XML/issues/633

mlinksva commented 1 year ago

Could you point out such a repo?

I cannot reproduce, and would not expect to be able to -- licensee normalizes http[s]. But if you point out a concrete example, it's possible you've discovered a new bug, whether related to http[s] or something else.

NoNameForMee commented 1 year ago

Dear @mlinksva, thanks for your quick response..

I feel I may have been a bit too quick in creating this issue here after reading that it was this repo that is responsible for identifying licenses by GitHub themselves, one example of where this repo is mentioned as the source for identification here: https://docs.github.com/en/rest/licenses?apiVersion=2022-11-28#about-licenses.

One example of a repo that has the old (plain-text http:// links) which does get identified, by GitHub using licensee, as GPL 3.0 is https://github.com/zweifisch/ob-http/blob/master/LICENSE

while one repo that does make use of the new (https:// only links) and which does not get identified, by GitHub using licensee, as GPL 3.0 is: https://github.com/abba23/spotify-adblock/blob/main/LICENSE

GitHub Docs
Licenses - GitHub Docs
Use the REST API to retrieve popular open source licenses and information about a particular project's license file.
GitHub
ob-http/LICENSE at master · zweifisch/ob-http
make http request within org-mode babel. Contribute to zweifisch/ob-http development by creating an account on GitHub.
GitHub
spotify-adblock/LICENSE at main · abba23/spotify-adblock
Adblocker for Spotify. Contribute to abba23/spotify-adblock development by creating an account on GitHub.
mlinksva commented 1 year ago

With the current version of licensee that is detected

% bundle exec licensee diff https://github.com/abba23/spotify-adblock --license gpl-3.0 
Comparing to GNU General Public License v3.0:
Input Length:      31525
License length:    31525
Similarity:      100.00%
Exact match!

I'm not sure why it wouldn't have been detected at the time the LICENSE file was pushed (April 2021) but I don't think it's a current problem in licensee. GitHub support may be able to re-run detection on that repo if you ask.