jpeddicord / askalono

A tool & library to detect open source licenses from texts
Apache License 2.0
256 stars 25 forks source link

Does not detect MPLv2 header in source file #41

Closed phrohdoh closed 5 years ago

phrohdoh commented 5 years ago

lib.rs

// This Source Code Form is subject to the terms of the Mozilla Public
// License, v. 2.0. If a copy of the MPL was not distributed with this
// file, You can obtain one at http://mozilla.org/MPL/2.0/.

struct X;
$ askalono --version                                              
askalono 0.3.0

$ askalono id --optimize src/lib.rs
Error: Confidence threshold not high enough for any known license

The MPLv2 has the following exhibit which confirms that the header is correct:

Exhibit A - Source Code Form License Notice
-------------------------------------------

  This Source Code Form is subject to the terms of the Mozilla Public
  License, v. 2.0. If a copy of the MPL was not distributed with this
  file, You can obtain one at http://mozilla.org/MPL/2.0/.

If it is not possible or desirable to put the notice in a particular
file, then You may include the notice in a location (such as a LICENSE
file in a relevant directory) where a recipient would be likely to look
for such a notice.
jpeddicord commented 5 years ago

Interesting, I'd have thought that MPL header would be in the SPDX dataset but it doesn't appear to be. This is something that'll need to get fixed on that end; I'm happy to get that going but acknowledge that I can be a bit slow on that point.

I think https://github.com/spdx/license-list-XML/blob/master/src/MPL-2.0.xml more or less needs a standardLicenseHeader block, and then the license-list-data repository needs to be regenerated.

bradleeedmondson commented 5 years ago

You're not wrong; we should add the license header text to the definition of that license. But I'd also ask you to consider using SPDX Short Identifiers in source code, for a more succinct and more machine-friendly application of any license on the SPDX License List to your sources: https://spdx.org/ids

Up to you and your project, of course, but I mention because not a lot of people know about that option.

jpeddicord commented 5 years ago

@bradleeedmondson Absolutely, SPDX short identifiers are the way to go. But in this case, askalono is something that tries to identify licenses from texts -- the issue here is that some of the text it should be identifying is missing from its dataset.

phrohdoh commented 5 years ago

https://github.com/spdx/license-list-XML/issues/849 has been fixed.

Can we update askalono to include this header so this issue may be closed?

jpeddicord commented 5 years ago

Working on pulling in new SPDX data now. Interestingly, it's causing a unit test to fail (a self-test to ensure MIT is detected as MIT). I suspect the format may have changed slightly. Digging into that.

jpeddicord commented 5 years ago

Pulled in and verified:

❯❯❯ cat test.txt
// This Source Code Form is subject to the terms of the Mozilla Public
// License, v. 2.0. If a copy of the MPL was not distributed with this
// file, You can obtain one at http://mozilla.org/MPL/2.0/.

struct X;

❯❯❯ just cli id --optimize ./test.txt
 ...
./target/release/askalono id --optimize ./test.txt
License: MPL-2.0 (license header)
Score: 0.972
Containing:
  License: MPL-2.0 (license header)
  Score: 1.000
  Lines: 0 - 4
  Aliases: MPL-2.0-no-copyleft-exception

This will go out in the next release, which I hope to prepare soon. :)