jslicense / licensee.js

check dependency licenses against rules
https://www.npmjs.com/package/licensee
Apache License 2.0
185 stars 23 forks source link

Detect licenses in more locations #57

Closed ronkorving closed 4 years ago

ronkorving commented 4 years ago

It would be awesome if the license was found not just in package.json, but also in other files like LICENSE and the readme.

This package does exactly that: https://www.npmjs.com/package/license-checker#how-licenses-are-found This is their implementation on the strings in those files: https://github.com/davglass/license-checker/blob/master/lib/license.js

Thoughts?

kemitchell commented 4 years ago

@ronkorving There are indeed still npm packages without proper license properties in package.json. But automatically analyzing messy, non-schema data for license conclusions gets pretty iffy. You end up doing NLP, and setting a confidence threshold.

I'm very interested in approaches like license-checker's as hints for manual reviewers. For example, licensee optionally applies third-party conclusions about the licenses of some well known packages from npm-license-corrections. We could potentially use automated hints to create more entries in that package, especially for popular or otherwise important packages.

ronkorving commented 4 years ago

Rather than NLP, one could hash a license file (although it often contains an author's name, which defeats the purpose) - so... yeah, maybe not.

Funnily enough though, GitHub has a built-in detection feature as you may have noticed before, which uses the Ruby gem Licensee which also uses some heuristics to detect the license (in a better way I think than the aforementioned license-checker). So it wouldn't be without precedent.

kemitchell commented 4 years ago

Every method for determining licenses algorithmically represents a trade-off. Relying on package.json isn't perfectly error-proof. licensee optionally supports relying on npm-license-corrections, which entails its own risks and rewards, too. It's very clear what the program does, and that helps users decide how to work with its output.

A problem with heuristics, even for the kind of simpler problem that spdx-correct addresses, becomes inscrutable, fast. It isn't transparent how the conclusions get made, which makes it less clear how or how much to rely on them.

I'm definitely open to adding flags to licensee to support other heuristics. But that does get complex fast. In particular, when we start looking outside package.json's license property, we have to decide how to handle conflicts. What if the NLP README scanner disagrees with package.json?

For now, I think licensee strikes a very transparent, and therefore usable, balance. If you see efforts to do more broad-ranging license detection, especially in the form of a reusable library, rather than another freestanding utility, I'd appreciate a note about it. But I don't want to mislead you: I don't feel personally motivated to make licensee the "home" for that new and different effort right now. It's not something I see a lot of demand for in my day-to-day. I advise clients on license compliance pretty regularly.

ronkorving commented 4 years ago

I appreciate everything you say here, and agree that solving this should not really be done in this package. If a good library-solution exists (I haven't seen one), we could consider using it based on how strict it is perhaps, but if not, I guess that should then be the end of the conversation :) I may build a very strict library myself some day, but for now we'll leave it here.

Thanks for your feedback.

kemitchell commented 4 years ago

Of course, @ronkorving! I'm grateful for the chance to write out my reasons, which only happened thanks to you.

ronkorving commented 4 years ago

@kemitchell cheers ;)