spdx / tools

SPDX Tools
Apache License 2.0
127 stars 69 forks source link

"Unexpected SPDX error parsing license string" with certain license identifiers #272

Closed alpianon closed 3 years ago

alpianon commented 3 years ago

While intensively using spdx-tools' TagToRDF function, I encountered an Unexpected SPDX error parsing license string (which causes rdf file creation to stop) when trying to convert SPDX TagValue files containing one or more of the following license identifiers:

This list may not be complete; they are just the problematic licenses I found so far.

The log information (even activating debug loglevel) does not tell anything relevant to understand the cause of the error; with debug loglevel, I noticed the parser stopped working after parsing a certain license identifier, so I tried to manually remove that license identifier from the SPDX file and re-run TagToRDF, and I noticed that in that way it worked.

I attach some spdx-tv files affected by this issue: spdx-tv_samples.zip

For my tests, I used the latest release file spdx-tools-2.2.4-jar-with-dependencies.jar

sample commands:

java -jar /usr/local/lib/spdx-tools-2.2.4-jar-with-dependencies.jar TagToRDF kernel-liteos-m_2021.03.22-scancode.spdx kernel-liteos-m_2021.03.22-scancode.spdx.rdf

# with debug loglevel:
java -Dlog4j.configurationFile=./log4j2-debug.xml -jar /usr/local/lib/spdx-tools-2.2.4-jar-with-dependencies.jar TagToRDF kernel-liteos-m_2021.03.22-scancode.spdx kernel-liteos-m_2021.03.22-scancode.spdx.rdf
goneall commented 3 years ago

In doing some quick debugging, it seems that there is an issue trying to fetch the license data from the License List website for the MulanPSL-2.0 license.

I'll do a bit more digging later today or tomorrow to see if I can find the root cause.

goneall commented 3 years ago

BTW - I noticed the tag/value file is missing the required SPDX ID's for the files. This causes the newer version of this utility to trip up. Otherwise, I would encourage you to try the tools-java version which is a bit more supportable.

goneall commented 3 years ago

Getting closer to the root cause - the JSON-LD on the license list website do not contain a LicenseID field. The JSON representation, however, does contain the LicenseID field. The new tools-java utility uses JSON and doesn't have the same issue.

I'll look into why the LicenseListPublisher is not serializing the License ID in the JSON-LD format.

goneall commented 3 years ago

Resolved with PR #273

@alpianon let me know if you would like me to spin another release including this fix

alpianon commented 3 years ago

thx @goneall ! It would be nice to have a fix for #268 included in a new release, too

goneall commented 3 years ago

Resolved in release 2.2.5

alpianon commented 3 years ago

@goneall thanks so much: tested, it works!

Would it be possible for you to add to v2.2.5 release artifacts also the built jar spdx-tools-2.2.5-jar-with-dependencies.jar like in the previous releases, just for convenience? Thanks!

goneall commented 3 years ago

@alpianon you're welcome

Would it be possible for you to add to v2.2.5 release artifacts also the built jar spdx-tools-2.2.5-jar-with-dependencies.jar like in the previous releases, just for convenience?

Done

alpianon commented 3 years ago

thx!