Closed stephenkilbaneadi closed 1 year ago
@stephenkilbaneadi This is due to the JSON format has an associated schema file which can provide additional validation which is likely where the warnings are coming from.
BTW - there is validation for SPDX elements for YAML but this is done after the YAML file is parsed - kind of a logical model validation vs. a file syntax and structure validation that occurs with the schema validation.
I'm not following. If I say I'm uploading a YAML file, but upload a JSON file instead, are you saying it manages to detect that it's a JSON file anyway and parses it okay, but doesn't retrieve the associated schema file, hence no warnings? Whereas if I upload the sam JSON file, but say it's JSON rather than YAML, then the associated schema file is used, hence warnings?
Honestly, I'd just expect an error like "This doesn't look like a YAML file"...
@stephenkilbaneadi Ahh - Yes - this would be an issue with the tools - I guess I didn't read your description very carefully this morning.
I'll take a look and see if I can find a fix.
I found the source of the issue, but this may be a challenge to fix.
The JSON schema is validated based on the serialization format specified on the verification command: https://github.com/spdx/tools-java/blob/b949e88a2fb900b120f1f915d839dd022c134072/src/main/java/org/spdx/tools/Verify.java#L142
It looks like you can pass a JSON stream to the Jackson YAML factory and it will process it as JSON.
I could not find a way to tell if the YAML factory found a JSON file.
I'm actually quite surprised the parsing succeeded since it is not a valid YAML stream.
To fix this, we could inspect the file to see if it is JSON even though YAML was input as the format.
@stephenkilbaneadi Do you think this would be a common issue?
I'd hope it wouldn't be too common but, since it stems from a user error, it's easily accidentally repeatable. I'm still unclear whether the validation is actually happening, in this case and, if so, whether it's correct that no warnings are emitted, despite such warnings being emitted when the type is correctly identified as JSON. It does seem to be an edge case that provides a rich vein of confusion. :-(
Is it possible to simply check the MIME type of the submitted file against the declared file type, or does the serialisation make that problematic?
Or, to put it another way: I opened a bug with Synopsys because their SPDX output was producing warnings when validated; they've argued that this is a problem with the validator because "look no problems if you select YAML as input type..." I believe I should be going back to Synopsys and saying "Yes, there's a problem with the validator, but the problematic case is when the warnings aren't emitted, and there are generation issues that Synopsys still need to address."
I'm still unclear whether the validation is actually happening, in this case and, if so, whether it's correct that no warnings are emitted, despite such warnings being emitted when the type is correctly identified as JSON. It does seem to be an edge case that provides a rich vein of confusion. :-(
There are 2 types of validations that occur:
The full model validation (1) is accomplished by examining every property in the SPDX document and checking it against the rules in the spec (e.g. cardinality, format, exclusions). This is done for all file types. So, for YAML - yes, it is validated.
The schema validation is only done when the format supports a schema. JSON supports a schema file whereas YAML does not. So, YAML does not do any schema validation. Although schema validation isn't as complete as the full model validation, it does find issue in the specific serialized file that may not show up in the model validation due to the parsers masking the issues.
All of the tools are open source, so if you need more information I can point you to the specific code that does the validation.
The short answer is - yes - validation does work with YAML files.
Is it possible to simply check the MIME type of the submitted file against the declared file type, or does the serialisation make that problematic?
Typically the MIME type is determined by the file extension. We could check for file type .json
but we would want to somehow cover for the possibility the file really is YAML as the user specified.
One approach I thought about was to read the first couple of lines and see if it "looks like" YAML or JSON. I though that approach was a bit kludgy and may make the code a bit harder to maintain. But if we think this is a significant issue, it may be worth the trouble.
I opened a bug with Synopsys because their SPDX output was producing warnings when validated; they've argued that this is a problem with the validator because "look no problems if you select YAML as input type..." I believe I should be going back to Synopsys and saying "Yes, there's a problem with the validator, but the problematic case is when the warnings aren't emitted, and there are generation issues that Synopsys still need to address."
@stephenkilbaneadi I agree with your point of view on this.
You can point the Synopsys team to the JSON schema in the spec. You can run an online schema validator (e.g. the Newtonsoft validator) to see if there are errors in the generated file. This is a validation completely independent of the online tools.
Note - it is possible that there is an issue with the schema, but I would suggest the Synopsys team evaluate the errors and if they believe the schema is in error they should open up an issue against the spec.
Hope that helps.
@goneall I have validated a number of SPDX SBOMs in Json format, produced by Black Duck in Newtonsoft validator, using the exact JSON schema in the spec you've referenced, and they all were validated successfully. So it does not look like the Json schema produced by BlackDuck is corrupt.
Edit - I can provide a demo SPDX SBOM from a test instance if it helps
@strashila Thanks for doing the research on this. If you could attach an SBOM that shows the validation error, I can look into it a bit further.
@goneall I'm attaching a few test SBOMs, which all validate against the schema. That said, the SBOMs also now validate against the tool https://tools.spdx.org/app/validate/ , which was not the case before. I was able to recreate the issue with BlackDuck SBOM not validating successfully with the tool when File Type is JSON a week ago, but can't do it now for some reason, and now the validation seems to be working.
strashila_python_cicd_tests-buildless-1.16_SPDX_22_2022-11-21_085803.zip alex-mvn-tests-3.0_SPDX_22_2022-11-21_090417.zip
That said, the SBOMs also now validate against the tool https://tools.spdx.org/app/validate/ , which was not the case before.
@strashila The online tools were recently updated, so the previous validation errors may have been an issue with the online tools which has been fixed.
@goneall thank you for checking. @stephenkilbaneadi this issue may have been resolved, I successfully verified a number of BlackDuck SPDX json files as JSON File Type with the online tool
@strashila @stephenkilbaneadi - shall we close this issue?
@goneall Ok with me
Closing per the above comments
I'm validating SPDX files produced by Black Duck, which are JSON and (allegedly) SPDX 2.2. When I upload said files to the online validator as JSON file, I get various warnings (the warnings depend on the file contents). If, however, YAML is selected as the upload format, the file supposedly validates okay - even though it's a JSON file, not a YAML file.