google / go-licenses

A lightweight tool to report on the licenses used by a Go package and its dependencies. Highlight! Versioned external URL to licenses can be found at the same time.
Apache License 2.0
833 stars 125 forks source link

Detect licenses names like MIT-LICENSE.txt #142

Open silverwind opened 2 years ago

silverwind commented 2 years ago

https://github.com/mrjones/oauth has a MIT-LICENSE.txt in its root but it's currently not detected:

E0903 20:40:47.315979   75355 library.go:115] Failed to find license for github.com/mrjones/oauth: cannot find a known open source license for "/go/pkg/mod/github.com/mrjones/oauth@v0.0.0-20190623134757-126b35219450" whose name matches regexp ^(?i)((UN)?LICEN(S|C)E|COPYING|README|NOTICE).*$ and locates up until "/go/pkg/mod/github.com/mrjones/oauth@v0.0.0-20190623134757-126b35219450"

Maybe the ^ should be removed from the regex.

Bobgy commented 2 years ago

We may need to split up to multiple regex patterns. For some, it doesn't make sense to remove the ^.

PR welcomed!

silverwind commented 2 years ago

FWIW, Github does detect the license correctly on that repo. Maybe their code that does it is open-source, not sure.

Edit: It is https://github.com/licensee/licensee as per this. It uses a scoring system with multiple regexes.

Bobgy commented 2 years ago

Be careful with directly using other open source project's code, there are license issues.

It's good to know someone else also uses multiple patterns.

To add to that, the patterns can have different confidence level. If a file named LICENSE doesn't have a recognized license, we should raise an error. But if a file abcd-licenser.txt doesn't have license content, we can log a debug message instead.

Current behavior: we check for all files matching the regex, but we don't log an error at all for individual files.

silverwind commented 1 year ago

Asked the author to rename in https://github.com/mrjones/oauth/issues/74.