spdx / spdx-java-tagvalue-store

SPDX Document Storage using the Tag/Value format
Apache License 2.0
2 stars 1 forks source link

Validation differences in `extractedText` #37

Closed armintaenzertng closed 1 year ago

armintaenzertng commented 1 year ago

During research for this issue I came across an inconsistency in the java-tools (and the online validator). Converting this tag-value file (which is marked as valid by the java-tools):

SPDXVersion: SPDX-2.3
DataLicense: CC0-1.0
SPDXID: SPDXRef-DOCUMENT
DocumentName: SAG-PM generated SBOM
DocumentNamespace: dns:softwareassuranceguardian.com
Creator: Organization: dns:reliableenergyanalytics.com
Creator: Tool: SAG-PM Version: 1.2
Created: 2022-11-26T18:45:28Z
PackageName: apache-tomcat-9.0.69.zip
PackageVersion: 9.0.69
SPDXID: SPDXRef-Package-fc4a1bf0-78a0-43ca-b4a9-78adfb42138c
PackageSupplier: Organization: Apache Foundation
PackageDownloadLocation: https://dlcdn.apache.org/tomcat/tomcat-9/v9.0.69/bin/apache-tomcat-9.0.69.zip/
FilesAnalyzed: false
LicenseID: LicenseRef-Unlicense
LicenseName: Unlicense

to json will include a new tag extractedText:

"hasExtractedLicensingInfos" : [ {
    "licenseId" : "LicenseRef-Unlicense",
    "extractedText" : "WARNING: TEXT IS REQUIRED",
    "name" : "Unlicense"
  } ]

As also mentioned in the issue linked above, I believe that the extracted text is mandatory and the above tag-value example should not be marked as valid.

goneall commented 1 year ago

Thanks @armintaenzertng for the catch. I think this is an issue with the tag/value parser which places the "temporary text" in the extractedText field until the hasExtractedLicensingInfos is parsed.

I'll transfer this issue over to the tag/value store repo and look into a fix.