spdx / Spdx-Java-Library

Java library which implements the Java object model for SPDX and provides useful helper functions
Apache License 2.0
35 stars 33 forks source link

Official Apache-1.1 license text is not being matched correctly by LicenseCompareHelper.matchingStandardLicenseIdsWithinText() #230

Open pmonks opened 6 months ago

pmonks commented 6 months ago

When org.spdx.utility.compare.LicenseCompareHelper.matchingStandardLicenseIdsWithinText() is run on the official Apache-1.1 license text, it fails to find any matches, and I believe I've narrowed down the problem to the Clause5 alternative text tag in the template; if I remove the example header from the license text, and run org.spdx.utility.compare.LicenseCompareHelper.isTextStandardLicense().getDifferenceMessage() on it, I get:

Variable text rule combined-bullet-Clause5 did not match the compare text starting at line #31 column #1 "5" while processing rule var: combined-bullet-Clause5

When I manually converted that <alt> tag into a Java regex, and bullet 5 from the Apache 1.1 license text is manually cleansed of comment characters and newlines, I do get a match, so I'm pretty confident the problem is in the library rather than the template. Beyond that I'm not really sure what the root cause might be - whether it has to do with comment character handling, regexification of that particular <alt> tag, or something else entirely.

This was reproduced with Spdx-Java-Library v1.11 and SPDX license list v3.23.

pmonks commented 6 months ago

It it's helpful, I'm also seeing similar failures with the official Apache-1.0 license text too, though I haven't troubleshooted that to the same level of detail is I did with Apache-1.1.