jpeddicord / askalono

A tool & library to detect open source licenses from texts
Apache License 2.0
255 stars 25 forks source link

weird scoring / wrong identification for Apache-2.0 license text without appendix since spdx-license-list data v3.23 #94

Open decathorpe opened 5 months ago

decathorpe commented 5 months ago

see also https://github.com/spdx/license-list-XML/issues/2418

The spdx-license-list v3.23 update added "Pixar" license, which is a variant of Apache-2.0.

Using this version of the SPDX data, Apache-2.0 licenses without appendix (like the one from the rust-lang/rust repo), the file is now a closer match to "Pixar" than it is to "Apache-2.0" despite being a perfect copy except that the appendix is missing.

Is it possible that this is because the appendix that is marked as optional is not missing entirely? see https://github.com/spdx/license-list-XML/issues/2418#issuecomment-1995028762