nexB / scancode-toolkit

:mag: ScanCode detects licenses, copyrights, dependencies by "scanning code" ... to discover and inventory open source and third-party packages used in your code. Sponsored by NLnet project https://nlnet.nl/project/vulnerabilitydatabase, the Google Summer of Code, Azure credits, nexB and others generous sponsors!
https://github.com/nexB/scancode-toolkit/releases/
2.04k stars 532 forks source link

Cross correlate results from old Scancode with results from current Scancode to find regressions using Software Heritage licenses dump #3002

Open armijnhemel opened 2 years ago

armijnhemel commented 2 years ago

Short Description

Software Heritage has a dump of many license artifacts at https://annex.softwareheritage.org/public/dataset/license-blobs/2021-03-23/.

All of these licenses were scanned with an earlier version of Scancode (assuming a version from early 2021). It might be useful to search this dataset for unknown licenses or possible exceptions or regressions in Scancode.

Possible Labels

Select Category

armijnhemel commented 2 years ago

https://arxiv.org/pdf/2204.00256.pdf

armijnhemel commented 1 month ago

There is a newer archive available: https://annex.softwareheritage.org/public/dataset/license-blobs/2022-12-07/