holtwick / what-license.com

Quickly check what license text you are facing
https://holtwick.github.io/what-license.com/
7 stars 1 forks source link

Computation of match percentage #4

Open rpavlik opened 9 years ago

rpavlik commented 9 years ago

Related to https://github.com/holtwick/what-license.com/issues/3 - it looks like the match percentage is based on the percent of the input license that is found in a given known license. With the partial-word effect, and the presence of really short (bsd/mit-style) input licenses and known really long (GPL) licenses, I see higher than expected matches with (for instance) the GPL on almost every open source license, and this license still gets a 55% match even though it's short and proprietary.

Perhaps compute the percentage both directions and take the minimum? If I correctly guessed at the inner workings, I'd imagine that would drop the GPL out of contention when it's not actually being sought. (However, the caveat is that it would not match the header boilerplate that it contains - might involve having the boilerplates for licenses that have them being in the database separately)