Open goneall opened 3 years ago
@goneall Great initiative! Please let me know if I can be of any help with Python. I can write a python script utility to download specific license templates/xml file repo and run the tool on them. Maybe having this utility as part of test utils will help in generating more future tests like for v3.13 or v2.x.
I can write a python script utility to download specific license templates/xml file repo and run the tool on them.
@rtgdk that would be a great help if you could write such a utility
I would like to contribute to this issue. Is this open currently? @rtgdk @goneall
@kahanikaar Please feel free to contribute to this issue - thanks
Given that https://github.com/m1kit/yalm-resources/blob/874bc2162b3edab7fabb6ed0a76e97dc7c828530/meta.json declares to use the license list in version 3.14, I just did some quick testing and observed a success rate of 285/515 (55.34 %) for the exact target matches.
Further observations:
yalm.licenses.SpdxLicense._test_regex
for the (correct) regex file ImageMagick
- even with 20 workers and a timeout of 180 minutes (which I consider rather large values for matching purposes).None
during matching (probably not due to timeout).depreciate_
, while the results miss this prefix. Ignoring this prefix during comparison will slightly increase the success rate to 289/515 (56.12 %) for the exact target matches.Testing code:
import json
from collections import defaultdict
from importlib import resources
from pathlib import Path
from yalm import detect_license, resources as data
duplicates = json.loads(resources.read_text(data, 'expected-duplicates.json'))
duplicate_mapping = defaultdict(set)
for entry in duplicates:
duplicate_mapping[entry['from']].add(entry['to'])
duplicate_mapping = dict(duplicate_mapping)
correct, total, result_is_none = 0, 0, 0
for path in sorted(
Path('license-list-XML-3.14', 'test', 'simpleTestForGenerator').glob('*.txt'),
key=lambda x: x.name.lower()
):
expected = path.stem
if expected.startswith('depreciate_'):
expected = expected[11:]
result = detect_license(text=path.read_text(), timeout=60, num_workers=20)
actual = result if not result else result.template.id
if expected == actual or actual in duplicate_mapping.get(expected, set()):
correct += 1
print('✔', actual)
else:
print('✗', expected, actual)
if actual is None:
result_is_none += 1
total += 1
print(f'{correct}/{total} ({correct / total:.2%}) detected correctly.')
# print(result_is_none)
Complete results:
✔ 0BSD
✗ 389-exception None
✔ AAL
✔ Abstyles
✔ Adobe-2006
✔ Adobe-Glyph
✔ ADSL
✔ AFL-1.1
✔ AFL-1.2
✔ AFL-2.0
✔ AFL-2.1
✗ AFL-3.0 None
✔ Afmparse
✗ AGPL-1.0-only None
✗ AGPL-1.0-or-later None
✗ AGPL-1.0 None
✗ AGPL-3.0-only None
✗ AGPL-3.0-or-later None
✗ AGPL-3.0 None
✔ Aladdin
✗ AMDPLPA None
✔ AML
✔ AMPAS
✗ ANTLR-PD-fallback None
✔ ANTLR-PD
✗ Apache-1.0 None
✗ Apache-1.1 None
✔ Apache-2.0
✔ APAFML
✗ APL-1.0 None
✗ APSL-1.0 None
✗ APSL-1.1 None
✗ APSL-1.2 None
✗ APSL-2.0 None
✔ Artistic-1.0-cl8
✔ Artistic-1.0-Perl
✔ Artistic-1.0
✔ Artistic-2.0
✗ Autoconf-exception-2.0 None
✗ Autoconf-exception-3.0 None
✔ Bahyph
✔ Barr
✔ Beerware
✗ Bison-exception-2.2 GPL-2.0-with-bison-exception
✗ BitTorrent-1.0 None
✗ BitTorrent-1.1 None
✔ blessing
✔ BlueOak-1.0.0
✗ Bootloader-exception None
✔ Borceux
✔ BSD-1-Clause
✔ BSD-2-Clause-FreeBSD
✗ BSD-2-Clause-NetBSD BSD-2-Clause
✔ BSD-2-Clause-Patent
✔ BSD-2-Clause-Views
✔ BSD-2-Clause
✗ BSD-3-Clause-Attribution None
✔ BSD-3-Clause-Clear
✔ BSD-3-Clause-LBNL
✔ BSD-3-Clause-Modification
✔ BSD-3-Clause-No-Military-License
✔ BSD-3-Clause-No-Nuclear-License-2014
✔ BSD-3-Clause-No-Nuclear-License
✔ BSD-3-Clause-No-Nuclear-Warranty
✔ BSD-3-Clause-Open-MPI
✔ BSD-3-Clause
✔ BSD-4-Clause-Shortened
✗ BSD-4-Clause-UC BSD-4-Clause
✔ BSD-4-Clause
✔ BSD-Protection
✔ BSD-Source-Code
✔ BSL-1.0
✗ BUSL-1.1 None
✗ bzip2-1.0.5 None
✗ bzip2-1.0.6 None
✗ C-UDA-1.0 None
✗ CAL-1.0-Combined-Work-Exception None
✗ CAL-1.0 None
✔ Caldera
✔ CATOSL-1.1
✔ CC-BY-1.0
✔ CC-BY-2.0
✗ CC-BY-2.5-AU None
✔ CC-BY-2.5
✔ CC-BY-3.0-AT
✔ CC-BY-3.0-DE
✔ CC-BY-3.0-NL
✔ CC-BY-3.0-US
✗ CC-BY-3.0 None
✔ CC-BY-4.0
✔ CC-BY-NC-1.0
✔ CC-BY-NC-2.0
✔ CC-BY-NC-2.5
✔ CC-BY-NC-3.0-DE
✔ CC-BY-NC-3.0
✔ CC-BY-NC-4.0
✔ CC-BY-NC-ND-1.0
✔ CC-BY-NC-ND-2.0
✔ CC-BY-NC-ND-2.5
✔ CC-BY-NC-ND-3.0-DE
✔ CC-BY-NC-ND-3.0-IGO
✔ CC-BY-NC-ND-3.0
✔ CC-BY-NC-ND-4.0
✔ CC-BY-NC-SA-1.0
✗ CC-BY-NC-SA-2.0-FR None
✗ CC-BY-NC-SA-2.0-UK None
✔ CC-BY-NC-SA-2.0
✔ CC-BY-NC-SA-2.5
✔ CC-BY-NC-SA-3.0-DE
✗ CC-BY-NC-SA-3.0-IGO None
✔ CC-BY-NC-SA-3.0
✔ CC-BY-NC-SA-4.0
✔ CC-BY-ND-1.0
✗ CC-BY-ND-2.0 None
✔ CC-BY-ND-2.5
✔ CC-BY-ND-3.0-DE
✔ CC-BY-ND-3.0
✔ CC-BY-ND-4.0
✔ CC-BY-SA-1.0
✗ CC-BY-SA-2.0-UK None
✔ CC-BY-SA-2.0
✔ CC-BY-SA-2.1-JP
✔ CC-BY-SA-2.5
✔ CC-BY-SA-3.0-AT
✔ CC-BY-SA-3.0-DE
✗ CC-BY-SA-3.0 None
✔ CC-BY-SA-4.0
✔ CC-PDDC
✔ CC0-1.0
✗ CDDL-1.0 None
✗ CDDL-1.1 None
✗ CDL-1.0 None
✗ CDLA-Permissive-1.0 None
✔ CDLA-Permissive-2.0
✗ CDLA-Sharing-1.0 None
✗ CECILL-1.0 None
✔ CECILL-1.1
✗ CECILL-2.0 None
✗ CECILL-2.1 None
✗ CECILL-B None
✗ CECILL-C None
✗ CERN-OHL-1.1 None
✗ CERN-OHL-1.2 None
✗ CERN-OHL-P-2.0 None
✗ CERN-OHL-S-2.0 None
✗ CERN-OHL-W-2.0 None
✔ ClArtistic
✗ Classpath-exception-2.0 None
✗ CLISP-exception-2.0 None
✔ CNRI-Jython
✔ CNRI-Python-GPL-Compatible
✔ CNRI-Python
✗ Condor-1.1 None
✗ copyleft-next-0.3.0 None
✗ copyleft-next-0.3.1 None
✗ CPAL-1.0 None
✔ CPL-1.0
✔ CPOL-1.02
✔ Crossword
✔ CrystalStacker
✗ CUA-OPL-1.0 None
✗ Cube None
✔ curl
✗ D-FSL-1.0 None
✔ eCos-2.0
✗ GPL-1.0+ GPL-1.0
✗ GPL-2.0+ None
✗ GPL-2.0-with-autoconf-exception None
✔ GPL-2.0-with-bison-exception
✗ GPL-2.0-with-classpath-exception None
✗ GPL-2.0-with-font-exception None
✗ GPL-2.0-with-GCC-exception None
✗ GPL-3.0+ None
✗ GPL-3.0-with-autoconf-exception None
✔ GPL-3.0-with-GCC-exception
✗ LGPL-2.0+ None
✗ LGPL-2.1+ None
✗ LGPL-3.0+ None
✔ StandardML-NJ
✗ WXwindows None
✔ diffmark
✗ DigiRule-FOSS-exception None
✔ DOC
✔ Dotseqn
✔ DRL-1.0
✔ DSDP
✔ dvipdfm
✔ ECL-1.0
✔ ECL-2.0
✗ eCos-exception-2.0 None
✔ EFL-1.0
✔ EFL-2.0
✔ eGenix
✗ Entessa None
✔ EPICS
✗ EPL-1.0 None
✗ EPL-2.0 None
✔ ErlPL-1.1
✗ etalab-2.0 None
✗ EUDatagrid None
✗ EUPL-1.0 None
✗ EUPL-1.1 None
✗ EUPL-1.2 None
✗ Eurosym None
✗ Fair None
✗ Fawkes-Runtime-exception None
✗ FLTK-exception None
✗ Font-exception-2.0 None
✔ Frameworx-1.0
✗ FreeBSD-DOC None
✔ FreeImage
✗ freertos-exception-2.0 None
✔ FSFAP
✔ FSFUL
✔ FSFULLR
✔ FTL
✗ GCC-exception-2.0 None
✗ GCC-exception-3.1 None
✗ GD None
✗ GFDL-1.1-invariants-only GFDL-1.1
✗ GFDL-1.1-invariants-or-later GFDL-1.1
✗ GFDL-1.1-no-invariants-only GFDL-1.1
✗ GFDL-1.1-no-invariants-or-later GFDL-1.1
✗ GFDL-1.1-only GFDL-1.1
✗ GFDL-1.1-or-later GFDL-1.1
✔ GFDL-1.1
✗ GFDL-1.2-invariants-only GFDL-1.2
✗ GFDL-1.2-invariants-or-later GFDL-1.2
✗ GFDL-1.2-no-invariants-only GFDL-1.2
✗ GFDL-1.2-no-invariants-or-later GFDL-1.2
✗ GFDL-1.2-only GFDL-1.2
✗ GFDL-1.2-or-later GFDL-1.2
✔ GFDL-1.2
✗ GFDL-1.3-invariants-only GFDL-1.3
✗ GFDL-1.3-invariants-or-later GFDL-1.3
✗ GFDL-1.3-no-invariants-only GFDL-1.3
✗ GFDL-1.3-no-invariants-or-later GFDL-1.3
✗ GFDL-1.3-only GFDL-1.3
✗ GFDL-1.3-or-later GFDL-1.3
✔ GFDL-1.3
✔ Giftware
✔ GL2PS
✔ Glide
✔ Glulxe
✔ GLWTPL
✗ gnu-javamail-exception None
✔ gnuplot
✗ GPL-1.0-only GPL-1.0
✗ GPL-1.0-or-later GPL-1.0
✔ GPL-1.0
✗ GPL-2.0-only None
✗ GPL-2.0-or-later None
✗ GPL-2.0 None
✗ GPL-3.0-linking-exception None
✗ GPL-3.0-linking-source-exception None
✗ GPL-3.0-only None
✗ GPL-3.0-or-later None
✗ GPL-3.0 None
✗ GPL-CC-1.0 None
✗ gSOAP-1.3b None
✔ HaskellReport
✔ Hippocratic-2.1
✔ HPND-sell-variant
✗ HPND None
✔ HTMLTIDY
✗ i2p-gpl-java-exception None
✔ IBM-pibs
✔ ICU
✔ IJG
✗ ImageMagick None
✔ iMatix
✗ Imlib2 None
✔ Info-ZIP
✔ Intel-ACPI
✔ Intel
✗ Interbase-1.0 None
✔ IPA
✔ IPL-1.0
✔ ISC
✔ JasPer-2.0
✔ JPNIC
✔ JSON
✗ LAL-1.2 None
✗ LAL-1.3 None
✔ Latex2e
✔ Leptonica
✗ LGPL-2.0-only None
✗ LGPL-2.0-or-later None
✗ LGPL-2.0 None
✗ LGPL-2.1-only None
✗ LGPL-2.1-or-later None
✗ LGPL-2.1 None
✗ LGPL-3.0-linking-exception None
✗ LGPL-3.0-only None
✗ LGPL-3.0-or-later None
✗ LGPL-3.0 None
✗ LGPLLR None
✗ libpng-2.0 None
✗ Libpng None
✔ libselinux-1.0
✔ libtiff
✗ Libtool-exception None
✗ LiLiQ-P-1.1 None
✗ LiLiQ-R-1.1 None
✗ LiLiQ-Rplus-1.1 None
✔ Linux-OpenIB
✗ Linux-syscall-note None
✗ LLVM-exception None
✔ LPL-1.0
✔ LPL-1.02
✗ LPPL-1.0 None
✔ LPPL-1.1
✔ LPPL-1.2
✔ LPPL-1.3a
✔ LPPL-1.3c
✗ LZMA-exception None
✔ MakeIndex
✗ mif-exception None
✔ MirOS
✔ MIT-0
✗ MIT-advertising None
✔ MIT-CMU
✗ MIT-enna None
✗ MIT-feh None
✔ MIT-Modern-Variant
✔ MIT-open-group
✔ MIT
✔ MITNFA
✗ Motosoto None
✔ mpich2
✔ MPL-1.0
✗ MPL-1.1 None
✗ MPL-2.0-no-copyleft-exception None
✗ MPL-2.0 None
✔ MS-PL
✔ MS-RL
✔ MTLL
✔ MulanPSL-1.0
✔ MulanPSL-2.0
✗ Multics None
✔ Mup
✔ NAIST-2003
✔ NASA-1.3
✔ Naumen
✔ NBPL-1.0
✗ NCGL-UK-2.0 None
✔ NCSA
✔ Net-SNMP
✔ NetCDF
✔ Newsletr
✔ NGPL
✔ NIST-PD-fallback
✔ NIST-PD
✗ NLOD-1.0 None
✗ NLOD-2.0 None
✔ NLPL
✗ Nokia-Qt-exception-1.1 None
✗ Nokia None
✗ NOSL None
✔ Noweb
✔ NPL-1.0
✗ NPL-1.1 None
✔ NPOSL-3.0
✔ NRL
✔ NTP-0
✔ NTP
✗ Nunit None
✔ O-UDA-1.0
✗ OCaml-LGPL-linking-exception None
✗ OCCT-exception-1.0 None
✗ OCCT-PL None
✗ OCLC-2.0 None
✗ ODbL-1.0 None
✗ ODC-By-1.0 None
✔ OFL-1.0
✔ OFL-1.0
✔ OFL-1.0
✗ OFL-1.1-no-RFN OFL-1.1
✔ OFL-1.1
✔ OFL-1.1
✔ OGC-1.0
✔ OGDL-Taiwan-1.0
✗ OGL-Canada-2.0 None
✗ OGL-UK-1.0 None
✗ OGL-UK-2.0 None
✗ OGL-UK-3.0 None
✔ OGTSL
✗ OLDAP-1.1 NBPL-1.0
✔ OLDAP-1.2
✔ OLDAP-1.3
✔ OLDAP-1.4
✔ OLDAP-2.0.1
✔ OLDAP-2.0
✔ OLDAP-2.1
✔ OLDAP-2.2.1
✔ OLDAP-2.2.2
✔ OLDAP-2.2
✔ OLDAP-2.3
✔ OLDAP-2.4
✔ OLDAP-2.5
✔ OLDAP-2.6
✔ OLDAP-2.7
✔ OLDAP-2.8
✔ OML
✗ OpenJDK-assembly-exception-1.0 None
✗ OpenSSL None
✗ openvpn-openssl-exception None
✗ OPL-1.0 None
✔ OPUBL-1.0
✗ OSET-PL-2.1 None
✔ OSL-1.0
✔ OSL-1.1
✔ OSL-2.0
✔ OSL-2.1
✔ OSL-3.0
✔ Parity-6.0.0
✗ Parity-7.0.0 None
✗ PDDL-1.0 None
✗ PHP-3.0 None
✗ PHP-3.01 None
✔ Plexus
✔ PolyForm-Noncommercial-1.0.0
✔ PolyForm-Small-Business-1.0.0
✔ PostgreSQL
✗ PS-or-PDF-font-exception-20170817 None
✗ PSF-2.0 None
✔ psfrag
✗ psutils None
✔ Python-2.0
✔ Qhull
✔ QPL-1.0
✗ Qt-GPL-exception-1.0 None
✗ Qt-LGPL-exception-1.1 None
✗ Qwt-exception-1.0 None
✔ Rdisc
✔ RHeCos-1.1
✗ RPL-1.1 None
✗ RPL-1.5 None
✔ RPSL-1.0
✔ RSA-MD
✗ RSCPL None
✔ Ruby
✔ SAX-PD
✗ Saxpath None
✔ SCEA
✔ Sendmail-8.23
✔ Sendmail
✔ SGI-B-1.0
✔ SGI-B-1.1
✔ SGI-B-2.0
✔ SHL-0.5
✔ SHL-0.51
✗ SHL-2.0 None
✗ SHL-2.1 None
✔ SimPL-2.0
✔ SISSL-1.2
✔ SISSL
✔ Sleepycat
✗ SMLNJ StandardML-NJ
✔ SMPPL
✗ SNIA None
✗ Spencer-86 None
✔ Spencer-94
✔ Spencer-99
✗ SPL-1.0 None
✗ SSH-OpenSSH None
✔ SSH-short
✗ SSPL-1.0 None
✗ SugarCRM-1.1.3 None
✗ Swift-exception None
✔ SWL
✗ TAPR-OHL-1.0 None
✔ TCL
✔ TCP-wrappers
✔ TMate
✗ TORQUE-1.1 None
✔ TOSL
✔ TU-Berlin-1.0
✔ TU-Berlin-2.0
✗ u-boot-exception-2.0 None
✔ UCL-1.0
✔ Unicode-DFS-2015
✔ Unicode-DFS-2016
✔ Unicode-TOU
✗ Universal-FOSS-exception-1.0 None
✗ Unlicense None
✔ UPL-1.0
✔ Vim
✔ VOSTROM
✔ VSL-1.0
✔ W3C-19980720
✔ W3C-20150513
✔ W3C
✗ Watcom-1.0 None
✔ Wsuipa
✔ WTFPL
✗ WxWindows-exception-3.1 None
✔ X11
✔ Xerox
✗ XFree86-1.1 None
✔ xinetd
✔ Xnet
✗ xpp None
✔ XSkat
✔ YPL-1.0
✔ YPL-1.1
✔ Zed
✗ Zend-2.0 None
✔ Zimbra-1.3
✔ Zimbra-1.4
✗ zlib-acknowledgement None
✗ Zlib None
✗ ZPL-1.1 None
✔ ZPL-2.0
✔ ZPL-2.1
For comparison: Running https://github.com/nexB/scancode-toolkit on these examples has a success rate of 465/515 (90.29 %) and correctly detects ImageMagick as well.
I would like to use this code to replace the Java license matching used in the SPDX online tools.
Before making that change, I would like to test all of the SPDX listed licenses.
This can be done by downloaded all of the license templates from the License List Data templates repo and downloading text files from the License List XML test files repo.
If we do license compares against all the files in the test files and it matches all the templates from the templates directory that would demonstrate we have no false negatives.
We can also test for false positives by finding any matches against more than one template. There are some expected duplicates which are all documented in the License List XML expected-warnings file.
Note that to keep the test and template files consistent, you should download the same tagged version (e.g. v3.12 used in the above links).