spdx / yalm-python

Implement SPDX License Matching in Python. Project in CommunityBridge Linux Foundation 2020.
Apache License 2.0
6 stars 3 forks source link

Additional testing #14

Open goneall opened 3 years ago

goneall commented 3 years ago

I would like to use this code to replace the Java license matching used in the SPDX online tools.

Before making that change, I would like to test all of the SPDX listed licenses.

This can be done by downloaded all of the license templates from the License List Data templates repo and downloading text files from the License List XML test files repo.

If we do license compares against all the files in the test files and it matches all the templates from the templates directory that would demonstrate we have no false negatives.

We can also test for false positives by finding any matches against more than one template. There are some expected duplicates which are all documented in the License List XML expected-warnings file.

Note that to keep the test and template files consistent, you should download the same tagged version (e.g. v3.12 used in the above links).

rtgdk commented 3 years ago

@goneall Great initiative! Please let me know if I can be of any help with Python. I can write a python script utility to download specific license templates/xml file repo and run the tool on them. Maybe having this utility as part of test utils will help in generating more future tests like for v3.13 or v2.x.

goneall commented 3 years ago

I can write a python script utility to download specific license templates/xml file repo and run the tool on them.

@rtgdk that would be a great help if you could write such a utility

sanjibansg commented 3 years ago

I would like to contribute to this issue. Is this open currently? @rtgdk @goneall

goneall commented 3 years ago

@kahanikaar Please feel free to contribute to this issue - thanks

stefan6419846 commented 1 year ago

Given that https://github.com/m1kit/yalm-resources/blob/874bc2162b3edab7fabb6ed0a76e97dc7c828530/meta.json declares to use the license list in version 3.14, I just did some quick testing and observed a success rate of 285/515 (55.34 %) for the exact target matches.

Further observations:

Testing code:

import json
from collections import defaultdict
from importlib import resources
from pathlib import Path

from yalm import detect_license, resources as data

duplicates = json.loads(resources.read_text(data, 'expected-duplicates.json'))
duplicate_mapping = defaultdict(set)
for entry in duplicates:
    duplicate_mapping[entry['from']].add(entry['to'])
duplicate_mapping = dict(duplicate_mapping)

correct, total, result_is_none = 0, 0, 0

for path in sorted(
        Path('license-list-XML-3.14', 'test', 'simpleTestForGenerator').glob('*.txt'),
        key=lambda x: x.name.lower()
):
    expected = path.stem
    if expected.startswith('depreciate_'):
        expected = expected[11:]
    result = detect_license(text=path.read_text(), timeout=60, num_workers=20)
    actual = result if not result else result.template.id
    if expected == actual or actual in duplicate_mapping.get(expected, set()):
        correct += 1
        print('✔', actual)
    else:
        print('✗', expected, actual)
    if actual is None:
        result_is_none += 1
    total += 1

print(f'{correct}/{total} ({correct / total:.2%}) detected correctly.')
# print(result_is_none)

Complete results:

✔ 0BSD
✗ 389-exception None
✔ AAL
✔ Abstyles
✔ Adobe-2006
✔ Adobe-Glyph
✔ ADSL
✔ AFL-1.1
✔ AFL-1.2
✔ AFL-2.0
✔ AFL-2.1
✗ AFL-3.0 None
✔ Afmparse
✗ AGPL-1.0-only None
✗ AGPL-1.0-or-later None
✗ AGPL-1.0 None
✗ AGPL-3.0-only None
✗ AGPL-3.0-or-later None
✗ AGPL-3.0 None
✔ Aladdin
✗ AMDPLPA None
✔ AML
✔ AMPAS
✗ ANTLR-PD-fallback None
✔ ANTLR-PD
✗ Apache-1.0 None
✗ Apache-1.1 None
✔ Apache-2.0
✔ APAFML
✗ APL-1.0 None
✗ APSL-1.0 None
✗ APSL-1.1 None
✗ APSL-1.2 None
✗ APSL-2.0 None
✔ Artistic-1.0-cl8
✔ Artistic-1.0-Perl
✔ Artistic-1.0
✔ Artistic-2.0
✗ Autoconf-exception-2.0 None
✗ Autoconf-exception-3.0 None
✔ Bahyph
✔ Barr
✔ Beerware
✗ Bison-exception-2.2 GPL-2.0-with-bison-exception
✗ BitTorrent-1.0 None
✗ BitTorrent-1.1 None
✔ blessing
✔ BlueOak-1.0.0
✗ Bootloader-exception None
✔ Borceux
✔ BSD-1-Clause
✔ BSD-2-Clause-FreeBSD
✗ BSD-2-Clause-NetBSD BSD-2-Clause
✔ BSD-2-Clause-Patent
✔ BSD-2-Clause-Views
✔ BSD-2-Clause
✗ BSD-3-Clause-Attribution None
✔ BSD-3-Clause-Clear
✔ BSD-3-Clause-LBNL
✔ BSD-3-Clause-Modification
✔ BSD-3-Clause-No-Military-License
✔ BSD-3-Clause-No-Nuclear-License-2014
✔ BSD-3-Clause-No-Nuclear-License
✔ BSD-3-Clause-No-Nuclear-Warranty
✔ BSD-3-Clause-Open-MPI
✔ BSD-3-Clause
✔ BSD-4-Clause-Shortened
✗ BSD-4-Clause-UC BSD-4-Clause
✔ BSD-4-Clause
✔ BSD-Protection
✔ BSD-Source-Code
✔ BSL-1.0
✗ BUSL-1.1 None
✗ bzip2-1.0.5 None
✗ bzip2-1.0.6 None
✗ C-UDA-1.0 None
✗ CAL-1.0-Combined-Work-Exception None
✗ CAL-1.0 None
✔ Caldera
✔ CATOSL-1.1
✔ CC-BY-1.0
✔ CC-BY-2.0
✗ CC-BY-2.5-AU None
✔ CC-BY-2.5
✔ CC-BY-3.0-AT
✔ CC-BY-3.0-DE
✔ CC-BY-3.0-NL
✔ CC-BY-3.0-US
✗ CC-BY-3.0 None
✔ CC-BY-4.0
✔ CC-BY-NC-1.0
✔ CC-BY-NC-2.0
✔ CC-BY-NC-2.5
✔ CC-BY-NC-3.0-DE
✔ CC-BY-NC-3.0
✔ CC-BY-NC-4.0
✔ CC-BY-NC-ND-1.0
✔ CC-BY-NC-ND-2.0
✔ CC-BY-NC-ND-2.5
✔ CC-BY-NC-ND-3.0-DE
✔ CC-BY-NC-ND-3.0-IGO
✔ CC-BY-NC-ND-3.0
✔ CC-BY-NC-ND-4.0
✔ CC-BY-NC-SA-1.0
✗ CC-BY-NC-SA-2.0-FR None
✗ CC-BY-NC-SA-2.0-UK None
✔ CC-BY-NC-SA-2.0
✔ CC-BY-NC-SA-2.5
✔ CC-BY-NC-SA-3.0-DE
✗ CC-BY-NC-SA-3.0-IGO None
✔ CC-BY-NC-SA-3.0
✔ CC-BY-NC-SA-4.0
✔ CC-BY-ND-1.0
✗ CC-BY-ND-2.0 None
✔ CC-BY-ND-2.5
✔ CC-BY-ND-3.0-DE
✔ CC-BY-ND-3.0
✔ CC-BY-ND-4.0
✔ CC-BY-SA-1.0
✗ CC-BY-SA-2.0-UK None
✔ CC-BY-SA-2.0
✔ CC-BY-SA-2.1-JP
✔ CC-BY-SA-2.5
✔ CC-BY-SA-3.0-AT
✔ CC-BY-SA-3.0-DE
✗ CC-BY-SA-3.0 None
✔ CC-BY-SA-4.0
✔ CC-PDDC
✔ CC0-1.0
✗ CDDL-1.0 None
✗ CDDL-1.1 None
✗ CDL-1.0 None
✗ CDLA-Permissive-1.0 None
✔ CDLA-Permissive-2.0
✗ CDLA-Sharing-1.0 None
✗ CECILL-1.0 None
✔ CECILL-1.1
✗ CECILL-2.0 None
✗ CECILL-2.1 None
✗ CECILL-B None
✗ CECILL-C None
✗ CERN-OHL-1.1 None
✗ CERN-OHL-1.2 None
✗ CERN-OHL-P-2.0 None
✗ CERN-OHL-S-2.0 None
✗ CERN-OHL-W-2.0 None
✔ ClArtistic
✗ Classpath-exception-2.0 None
✗ CLISP-exception-2.0 None
✔ CNRI-Jython
✔ CNRI-Python-GPL-Compatible
✔ CNRI-Python
✗ Condor-1.1 None
✗ copyleft-next-0.3.0 None
✗ copyleft-next-0.3.1 None
✗ CPAL-1.0 None
✔ CPL-1.0
✔ CPOL-1.02
✔ Crossword
✔ CrystalStacker
✗ CUA-OPL-1.0 None
✗ Cube None
✔ curl
✗ D-FSL-1.0 None
✔ eCos-2.0
✗ GPL-1.0+ GPL-1.0
✗ GPL-2.0+ None
✗ GPL-2.0-with-autoconf-exception None
✔ GPL-2.0-with-bison-exception
✗ GPL-2.0-with-classpath-exception None
✗ GPL-2.0-with-font-exception None
✗ GPL-2.0-with-GCC-exception None
✗ GPL-3.0+ None
✗ GPL-3.0-with-autoconf-exception None
✔ GPL-3.0-with-GCC-exception
✗ LGPL-2.0+ None
✗ LGPL-2.1+ None
✗ LGPL-3.0+ None
✔ StandardML-NJ
✗ WXwindows None
✔ diffmark
✗ DigiRule-FOSS-exception None
✔ DOC
✔ Dotseqn
✔ DRL-1.0
✔ DSDP
✔ dvipdfm
✔ ECL-1.0
✔ ECL-2.0
✗ eCos-exception-2.0 None
✔ EFL-1.0
✔ EFL-2.0
✔ eGenix
✗ Entessa None
✔ EPICS
✗ EPL-1.0 None
✗ EPL-2.0 None
✔ ErlPL-1.1
✗ etalab-2.0 None
✗ EUDatagrid None
✗ EUPL-1.0 None
✗ EUPL-1.1 None
✗ EUPL-1.2 None
✗ Eurosym None
✗ Fair None
✗ Fawkes-Runtime-exception None
✗ FLTK-exception None
✗ Font-exception-2.0 None
✔ Frameworx-1.0
✗ FreeBSD-DOC None
✔ FreeImage
✗ freertos-exception-2.0 None
✔ FSFAP
✔ FSFUL
✔ FSFULLR
✔ FTL
✗ GCC-exception-2.0 None
✗ GCC-exception-3.1 None
✗ GD None
✗ GFDL-1.1-invariants-only GFDL-1.1
✗ GFDL-1.1-invariants-or-later GFDL-1.1
✗ GFDL-1.1-no-invariants-only GFDL-1.1
✗ GFDL-1.1-no-invariants-or-later GFDL-1.1
✗ GFDL-1.1-only GFDL-1.1
✗ GFDL-1.1-or-later GFDL-1.1
✔ GFDL-1.1
✗ GFDL-1.2-invariants-only GFDL-1.2
✗ GFDL-1.2-invariants-or-later GFDL-1.2
✗ GFDL-1.2-no-invariants-only GFDL-1.2
✗ GFDL-1.2-no-invariants-or-later GFDL-1.2
✗ GFDL-1.2-only GFDL-1.2
✗ GFDL-1.2-or-later GFDL-1.2
✔ GFDL-1.2
✗ GFDL-1.3-invariants-only GFDL-1.3
✗ GFDL-1.3-invariants-or-later GFDL-1.3
✗ GFDL-1.3-no-invariants-only GFDL-1.3
✗ GFDL-1.3-no-invariants-or-later GFDL-1.3
✗ GFDL-1.3-only GFDL-1.3
✗ GFDL-1.3-or-later GFDL-1.3
✔ GFDL-1.3
✔ Giftware
✔ GL2PS
✔ Glide
✔ Glulxe
✔ GLWTPL
✗ gnu-javamail-exception None
✔ gnuplot
✗ GPL-1.0-only GPL-1.0
✗ GPL-1.0-or-later GPL-1.0
✔ GPL-1.0
✗ GPL-2.0-only None
✗ GPL-2.0-or-later None
✗ GPL-2.0 None
✗ GPL-3.0-linking-exception None
✗ GPL-3.0-linking-source-exception None
✗ GPL-3.0-only None
✗ GPL-3.0-or-later None
✗ GPL-3.0 None
✗ GPL-CC-1.0 None
✗ gSOAP-1.3b None
✔ HaskellReport
✔ Hippocratic-2.1
✔ HPND-sell-variant
✗ HPND None
✔ HTMLTIDY
✗ i2p-gpl-java-exception None
✔ IBM-pibs
✔ ICU
✔ IJG
✗ ImageMagick None
✔ iMatix
✗ Imlib2 None
✔ Info-ZIP
✔ Intel-ACPI
✔ Intel
✗ Interbase-1.0 None
✔ IPA
✔ IPL-1.0
✔ ISC
✔ JasPer-2.0
✔ JPNIC
✔ JSON
✗ LAL-1.2 None
✗ LAL-1.3 None
✔ Latex2e
✔ Leptonica
✗ LGPL-2.0-only None
✗ LGPL-2.0-or-later None
✗ LGPL-2.0 None
✗ LGPL-2.1-only None
✗ LGPL-2.1-or-later None
✗ LGPL-2.1 None
✗ LGPL-3.0-linking-exception None
✗ LGPL-3.0-only None
✗ LGPL-3.0-or-later None
✗ LGPL-3.0 None
✗ LGPLLR None
✗ libpng-2.0 None
✗ Libpng None
✔ libselinux-1.0
✔ libtiff
✗ Libtool-exception None
✗ LiLiQ-P-1.1 None
✗ LiLiQ-R-1.1 None
✗ LiLiQ-Rplus-1.1 None
✔ Linux-OpenIB
✗ Linux-syscall-note None
✗ LLVM-exception None
✔ LPL-1.0
✔ LPL-1.02
✗ LPPL-1.0 None
✔ LPPL-1.1
✔ LPPL-1.2
✔ LPPL-1.3a
✔ LPPL-1.3c
✗ LZMA-exception None
✔ MakeIndex
✗ mif-exception None
✔ MirOS
✔ MIT-0
✗ MIT-advertising None
✔ MIT-CMU
✗ MIT-enna None
✗ MIT-feh None
✔ MIT-Modern-Variant
✔ MIT-open-group
✔ MIT
✔ MITNFA
✗ Motosoto None
✔ mpich2
✔ MPL-1.0
✗ MPL-1.1 None
✗ MPL-2.0-no-copyleft-exception None
✗ MPL-2.0 None
✔ MS-PL
✔ MS-RL
✔ MTLL
✔ MulanPSL-1.0
✔ MulanPSL-2.0
✗ Multics None
✔ Mup
✔ NAIST-2003
✔ NASA-1.3
✔ Naumen
✔ NBPL-1.0
✗ NCGL-UK-2.0 None
✔ NCSA
✔ Net-SNMP
✔ NetCDF
✔ Newsletr
✔ NGPL
✔ NIST-PD-fallback
✔ NIST-PD
✗ NLOD-1.0 None
✗ NLOD-2.0 None
✔ NLPL
✗ Nokia-Qt-exception-1.1 None
✗ Nokia None
✗ NOSL None
✔ Noweb
✔ NPL-1.0
✗ NPL-1.1 None
✔ NPOSL-3.0
✔ NRL
✔ NTP-0
✔ NTP
✗ Nunit None
✔ O-UDA-1.0
✗ OCaml-LGPL-linking-exception None
✗ OCCT-exception-1.0 None
✗ OCCT-PL None
✗ OCLC-2.0 None
✗ ODbL-1.0 None
✗ ODC-By-1.0 None
✔ OFL-1.0
✔ OFL-1.0
✔ OFL-1.0
✗ OFL-1.1-no-RFN OFL-1.1
✔ OFL-1.1
✔ OFL-1.1
✔ OGC-1.0
✔ OGDL-Taiwan-1.0
✗ OGL-Canada-2.0 None
✗ OGL-UK-1.0 None
✗ OGL-UK-2.0 None
✗ OGL-UK-3.0 None
✔ OGTSL
✗ OLDAP-1.1 NBPL-1.0
✔ OLDAP-1.2
✔ OLDAP-1.3
✔ OLDAP-1.4
✔ OLDAP-2.0.1
✔ OLDAP-2.0
✔ OLDAP-2.1
✔ OLDAP-2.2.1
✔ OLDAP-2.2.2
✔ OLDAP-2.2
✔ OLDAP-2.3
✔ OLDAP-2.4
✔ OLDAP-2.5
✔ OLDAP-2.6
✔ OLDAP-2.7
✔ OLDAP-2.8
✔ OML
✗ OpenJDK-assembly-exception-1.0 None
✗ OpenSSL None
✗ openvpn-openssl-exception None
✗ OPL-1.0 None
✔ OPUBL-1.0
✗ OSET-PL-2.1 None
✔ OSL-1.0
✔ OSL-1.1
✔ OSL-2.0
✔ OSL-2.1
✔ OSL-3.0
✔ Parity-6.0.0
✗ Parity-7.0.0 None
✗ PDDL-1.0 None
✗ PHP-3.0 None
✗ PHP-3.01 None
✔ Plexus
✔ PolyForm-Noncommercial-1.0.0
✔ PolyForm-Small-Business-1.0.0
✔ PostgreSQL
✗ PS-or-PDF-font-exception-20170817 None
✗ PSF-2.0 None
✔ psfrag
✗ psutils None
✔ Python-2.0
✔ Qhull
✔ QPL-1.0
✗ Qt-GPL-exception-1.0 None
✗ Qt-LGPL-exception-1.1 None
✗ Qwt-exception-1.0 None
✔ Rdisc
✔ RHeCos-1.1
✗ RPL-1.1 None
✗ RPL-1.5 None
✔ RPSL-1.0
✔ RSA-MD
✗ RSCPL None
✔ Ruby
✔ SAX-PD
✗ Saxpath None
✔ SCEA
✔ Sendmail-8.23
✔ Sendmail
✔ SGI-B-1.0
✔ SGI-B-1.1
✔ SGI-B-2.0
✔ SHL-0.5
✔ SHL-0.51
✗ SHL-2.0 None
✗ SHL-2.1 None
✔ SimPL-2.0
✔ SISSL-1.2
✔ SISSL
✔ Sleepycat
✗ SMLNJ StandardML-NJ
✔ SMPPL
✗ SNIA None
✗ Spencer-86 None
✔ Spencer-94
✔ Spencer-99
✗ SPL-1.0 None
✗ SSH-OpenSSH None
✔ SSH-short
✗ SSPL-1.0 None
✗ SugarCRM-1.1.3 None
✗ Swift-exception None
✔ SWL
✗ TAPR-OHL-1.0 None
✔ TCL
✔ TCP-wrappers
✔ TMate
✗ TORQUE-1.1 None
✔ TOSL
✔ TU-Berlin-1.0
✔ TU-Berlin-2.0
✗ u-boot-exception-2.0 None
✔ UCL-1.0
✔ Unicode-DFS-2015
✔ Unicode-DFS-2016
✔ Unicode-TOU
✗ Universal-FOSS-exception-1.0 None
✗ Unlicense None
✔ UPL-1.0
✔ Vim
✔ VOSTROM
✔ VSL-1.0
✔ W3C-19980720
✔ W3C-20150513
✔ W3C
✗ Watcom-1.0 None
✔ Wsuipa
✔ WTFPL
✗ WxWindows-exception-3.1 None
✔ X11
✔ Xerox
✗ XFree86-1.1 None
✔ xinetd
✔ Xnet
✗ xpp None
✔ XSkat
✔ YPL-1.0
✔ YPL-1.1
✔ Zed
✗ Zend-2.0 None
✔ Zimbra-1.3
✔ Zimbra-1.4
✗ zlib-acknowledgement None
✗ Zlib None
✗ ZPL-1.1 None
✔ ZPL-2.0
✔ ZPL-2.1

For comparison: Running https://github.com/nexB/scancode-toolkit on these examples has a success rate of 465/515 (90.29 %) and correctly detects ImageMagick as well.