pmonks / lice-comb

A Clojure library for software license detection.
Apache License 2.0
7 stars 0 forks source link

Consider switching to full SPDX license template matching #3

Closed pmonks closed 10 months ago

pmonks commented 2 years ago

Job Story

When the library is detecting licenses from license texts, I want that logic to use SPDX's matching guidelines, so I can be confident that it is detecting licenses in a way that is consistent with other tools in this space.

Potential Solutions:

SPDX publishes canonical license templates precisely for this purpose. Applying them is not necessarily trivial though, since:

  1. we'd probably want to cache all of the template files on local disk so that we're not re-reading them from the internet on every invocation - there are over 500 of them and they total several MBs in size
  2. it's computationally expensive as every template has to be matched against every single probable license text, to handle the case where multiple license texts have been concatenated into a single license file (yes this does happen in the Java/Clojure ecosystem...) - note however that the current (non-SPDX) logic assumes each text only contains a single license, so this would be a separate "side effect" improvement
pmonks commented 1 year ago

This has recently been added to the SPDX Java library, and it is probably better to leverage that functionality rather than rolling a green field implementation in Clojure.

pmonks commented 10 months ago

Fixed in v2.0