nexB / scancode-toolkit

:mag: ScanCode detects licenses, copyrights, dependencies by "scanning code" ... to discover and inventory open source and third-party packages used in your code. Sponsored by NLnet project https://nlnet.nl/project/vulnerabilitydatabase, the Google Summer of Code, Azure credits, nexB and others generous sponsors!
https://github.com/nexB/scancode-toolkit/releases/
2.02k stars 533 forks source link

freebsd package parser can't find Python or BSD license #1471

Open licodeli opened 5 years ago

licodeli commented 5 years ago

Description

When using freebsd package parser to get the package from this file: {"name":"py27-idna","origin":"dns/py-idna","version":"2.6","comment":"Internationalized Domain Names in Applications (IDNA)","maintainer":"koobs@FreeBSD.org","www":"https://github.com/kjd/idna","abi":"FreeBSD:10:*","arch":"freebsd:10:*","prefix":"/usr/local","flatsize":895585,"licenselogic":"and","licenses":["PSFL","BSD3CLAUSE"],"desc":"A library to support the Internationalised Domain Names in Applications\n(IDNA) protocol as specified in RFC 5891. This version of the protocol\nis often referred to as \"IDNA2008\" and can produce different res\nlts from the earlier standard from 2003.\n\nThe library is also intended to act as a suitable drop-in replacement\nfor the \"encodings.idna\" module that comes with the Python standard\nlibrary but currently only supports the older 2003 specification.\n\nWWW: https://github.com/kjd/idna","deps":{"py27-setuptools":{"origin":"devel/py-setuptools","version":"39.0.1"},"python27":{"origin":"lang/python27","version":"2.7.15"}},"categories":["python","dns"],"annotations":{"flavor":"py27"}}

It returns

"license_expression": "unknown",
  "declared_license": "PSFL AND BSD3CLAUSE",

The license should be BSD license or Python Software Foundation License instead of "unknown"

pombredanne commented 5 years ago

@linexb thank you. I wonder if for this and #1471 we could have a mapping of the "keys" that are used in FreeBSD and how they map to ScanCode keys? then we could use it as base for a smarter detection of their declared licenses. Also, we likely may want to keep the original declared as-is, e.g.

{
  "licenselogic":"and",
  "licenses":["PSFL","BSD3CLAUSE"]
}

rather than trying to craft some fake expression just yet?