My idea is to have a function which take a string representation of a license file that is found in a project and some additional parameters such as name of author, year of publishing...
Then, this file is compared to all licenses available in this project (applying additional parameters). Both are upper cased and compared. A string similarity metric is calculated (per licence) and a list of license name is returned (string similarity metric sorted - nearest first).
I'm not a specialist but it seems that several algorithms exist: Jaro-Winkler distance, Jaccard, Damerau-Levenshtein, Sørensen–Dice coefficient which is used by licensee...
Hello,
I need to detect under what license a project is distributed to help https://github.com/arduino/Arduino/issues/6646 and https://github.com/scls19fr/arduino_libraries_search/issues/2
Ruby have a interesting Gem for that https://github.com/benbalter/licensee but I'd prefer to have a Python library for that purpose.
My idea is to have a function which take a string representation of a license file that is found in a project and some additional parameters such as name of author, year of publishing...
Then, this file is compared to all licenses available in this project (applying additional parameters). Both are upper cased and compared. A string similarity metric is calculated (per licence) and a list of license name is returned (string similarity metric sorted - nearest first).
I'm not a specialist but it seems that several algorithms exist: Jaro-Winkler distance, Jaccard, Damerau-Levenshtein, Sørensen–Dice coefficient which is used by licensee...
What is your opinion about such a feature ?
Kind regards