Open decathorpe opened 8 months ago
Apologies for the incredibly slow reply here! I'm seeing that SPDX might have split out the optional sections of this which could help. I'm pulling in updates for that now and am encountering other scoring issues (BSD-3-Clause, this time) to debug -- hopefully nothing too crazy.
For what it's worth, regression tests can be added in to tests/data/real-licenses
; if there's a particular license in the future that's causing trouble then this can help inform the problem a little bit. But because of the way text-matching works in this library, only so much it will do.
Thank you for taking a look! Yeah, I reported this issue to the SPDX people, and they split the optional parts of the appendix further to try to help with this.
But I tried with the latest spdx license data version, and the issue is still there - this license text (without appendix but with the "END OF TERMS OF CONDITIONS" line), which is used by many Rust projects because they just copy the files from the rust-lang/rust repo, still triggers the issue of getting mis-classified as "Pixar":
https://github.com/rust-lang/rust/blob/master/LICENSE-APACHE
see also https://github.com/spdx/license-list-XML/issues/2418
The spdx-license-list v3.23 update added "Pixar" license, which is a variant of Apache-2.0.
Using this version of the SPDX data, Apache-2.0 licenses without appendix (like the one from the rust-lang/rust repo), the file is now a closer match to "Pixar" than it is to "Apache-2.0" despite being a perfect copy except that the appendix is missing.
Is it possible that this is because the appendix that is marked as optional is not missing entirely? see https://github.com/spdx/license-list-XML/issues/2418#issuecomment-1995028762