hroncok / license

Python library that encapsulates free software licenses
https://pypi.python.org/pypi/license
Other
20 stars 8 forks source link

Detect under what license a project is distributed #2

Closed s-celles closed 6 years ago

s-celles commented 6 years ago

Hello,

I need to detect under what license a project is distributed to help https://github.com/arduino/Arduino/issues/6646 and https://github.com/scls19fr/arduino_libraries_search/issues/2

Ruby have a interesting Gem for that https://github.com/benbalter/licensee but I'd prefer to have a Python library for that purpose.

My idea is to have a function which take a string representation of a license file that is found in a project and some additional parameters such as name of author, year of publishing...

Then, this file is compared to all licenses available in this project (applying additional parameters). Both are upper cased and compared. A string similarity metric is calculated (per licence) and a list of license name is returned (string similarity metric sorted - nearest first).

I'm not a specialist but it seems that several algorithms exist: Jaro-Winkler distance, Jaccard, Damerau-Levenshtein, Sørensen–Dice coefficient which is used by licensee...

What is your opinion about such a feature ?

Kind regards

hroncok commented 6 years ago

This one is PErl byt works fine as well and has a command line interface: http://search.cpan.org/dist/App-Licensecheck/