spdx / Spdx-Java-Library

Java library which implements the Java object model for SPDX and provides useful helper functions
Apache License 2.0
32 stars 33 forks source link

Official CC-BY-4.0 license text is not being matched correctly by LicenseCompareHelper.isTextStandardLicense() #233

Open pmonks opened 3 months ago

pmonks commented 3 months ago

When org.spdx.utility.compare.LicenseCompareHelper.isTextStandardLicense().isDifferenceFound() is run on the official CC-BY-4.0 license text, it (incorrectly) returns true (i.e. the standard license was not matched). When I run org.spdx.utility.compare.LicenseCompareHelper.isTextStandardLicense().getDifferenceMessage() on it, I get:

Normal text of license does not match starting at line #5 column #9 "commons" when comparing to template text "

By exercising the Licensed Rights (defined below), You accept and agree to be bound by the terms a...".  Last optional text was not found due to the optional difference:
    Normal text of license does not match starting at line #5 column #29 "(" when comparing to template text " Creative Commons Attribution 4.0 International Public License

".  Last optional text was not found due to the optional difference:
    Normal text of license does not match starting at line #42 column #5 "example" when comparing to template text "

Considerations for the public: By using one of our public licenses, a licensor grants the public p...".  Last optional text was not found due to the optional difference:
    Normal text of license does not match starting at line #42 column #5 "example" when comparing to template text "

Considerations for the public: By using one of our public licenses, a licensor grants the public p...".  Last optional text was not found due to the optional difference:
    Normal text of license does not match starting at line #1 column #12 "4" when comparing to template text "Creative Commons"

I don't see any mismatches when looking at the CC-BY-4.0 SPDX template however.

This was reproduced with Spdx-Java-Library v1.11 and SPDX license list v3.23.

pmonks commented 3 months ago

Note: this also seems to be occurring for some of the CC-BY-*-4.0 variations, including:

(these are just the subset of CC-BY-*-4.0 licenses that I'm testing against - there may be issues with other variations too)

goneall commented 3 months ago

Thanks @pmonks for the analysis - I'm currently traveling for the next week or so - I'll take a look once I'm back

pmonks commented 3 months ago

As always no rush! I've been traveling a lot myself in recent weeks, so have also been less active than usual.