spdx / tools-java

SPDX Command Line Tools using the Spdx-Java-Library
Apache License 2.0
61 stars 37 forks source link

YAML files not verified? #93

Closed stephenkilbaneadi closed 1 year ago

stephenkilbaneadi commented 1 year ago

I'm validating SPDX files produced by Black Duck, which are JSON and (allegedly) SPDX 2.2. When I upload said files to the online validator as JSON file, I get various warnings (the warnings depend on the file contents). If, however, YAML is selected as the upload format, the file supposedly validates okay - even though it's a JSON file, not a YAML file.

goneall commented 1 year ago

@stephenkilbaneadi This is due to the JSON format has an associated schema file which can provide additional validation which is likely where the warnings are coming from.

goneall commented 1 year ago

BTW - there is validation for SPDX elements for YAML but this is done after the YAML file is parsed - kind of a logical model validation vs. a file syntax and structure validation that occurs with the schema validation.

stephenkilbaneadi commented 1 year ago

I'm not following. If I say I'm uploading a YAML file, but upload a JSON file instead, are you saying it manages to detect that it's a JSON file anyway and parses it okay, but doesn't retrieve the associated schema file, hence no warnings? Whereas if I upload the sam JSON file, but say it's JSON rather than YAML, then the associated schema file is used, hence warnings?

Honestly, I'd just expect an error like "This doesn't look like a YAML file"...

goneall commented 1 year ago

@stephenkilbaneadi Ahh - Yes - this would be an issue with the tools - I guess I didn't read your description very carefully this morning.

I'll take a look and see if I can find a fix.

goneall commented 1 year ago

I found the source of the issue, but this may be a challenge to fix.

The JSON schema is validated based on the serialization format specified on the verification command: https://github.com/spdx/tools-java/blob/b949e88a2fb900b120f1f915d839dd022c134072/src/main/java/org/spdx/tools/Verify.java#L142

It looks like you can pass a JSON stream to the Jackson YAML factory and it will process it as JSON.

I could not find a way to tell if the YAML factory found a JSON file.

I'm actually quite surprised the parsing succeeded since it is not a valid YAML stream.

To fix this, we could inspect the file to see if it is JSON even though YAML was input as the format.

@stephenkilbaneadi Do you think this would be a common issue?

stephenkilbaneadi commented 1 year ago

I'd hope it wouldn't be too common but, since it stems from a user error, it's easily accidentally repeatable. I'm still unclear whether the validation is actually happening, in this case and, if so, whether it's correct that no warnings are emitted, despite such warnings being emitted when the type is correctly identified as JSON. It does seem to be an edge case that provides a rich vein of confusion. :-(

Is it possible to simply check the MIME type of the submitted file against the declared file type, or does the serialisation make that problematic?

stephenkilbaneadi commented 1 year ago

Or, to put it another way: I opened a bug with Synopsys because their SPDX output was producing warnings when validated; they've argued that this is a problem with the validator because "look no problems if you select YAML as input type..." I believe I should be going back to Synopsys and saying "Yes, there's a problem with the validator, but the problematic case is when the warnings aren't emitted, and there are generation issues that Synopsys still need to address."

goneall commented 1 year ago

I'm still unclear whether the validation is actually happening, in this case and, if so, whether it's correct that no warnings are emitted, despite such warnings being emitted when the type is correctly identified as JSON. It does seem to be an edge case that provides a rich vein of confusion. :-(

There are 2 types of validations that occur:

  1. Full model validation
  2. Schema validate

The full model validation (1) is accomplished by examining every property in the SPDX document and checking it against the rules in the spec (e.g. cardinality, format, exclusions). This is done for all file types. So, for YAML - yes, it is validated.

The schema validation is only done when the format supports a schema. JSON supports a schema file whereas YAML does not. So, YAML does not do any schema validation. Although schema validation isn't as complete as the full model validation, it does find issue in the specific serialized file that may not show up in the model validation due to the parsers masking the issues.

All of the tools are open source, so if you need more information I can point you to the specific code that does the validation.

The short answer is - yes - validation does work with YAML files.

goneall commented 1 year ago

Is it possible to simply check the MIME type of the submitted file against the declared file type, or does the serialisation make that problematic?

Typically the MIME type is determined by the file extension. We could check for file type .json but we would want to somehow cover for the possibility the file really is YAML as the user specified.

One approach I thought about was to read the first couple of lines and see if it "looks like" YAML or JSON. I though that approach was a bit kludgy and may make the code a bit harder to maintain. But if we think this is a significant issue, it may be worth the trouble.

goneall commented 1 year ago

I opened a bug with Synopsys because their SPDX output was producing warnings when validated; they've argued that this is a problem with the validator because "look no problems if you select YAML as input type..." I believe I should be going back to Synopsys and saying "Yes, there's a problem with the validator, but the problematic case is when the warnings aren't emitted, and there are generation issues that Synopsys still need to address."

@stephenkilbaneadi I agree with your point of view on this.

You can point the Synopsys team to the JSON schema in the spec. You can run an online schema validator (e.g. the Newtonsoft validator) to see if there are errors in the generated file. This is a validation completely independent of the online tools.

Note - it is possible that there is an issue with the schema, but I would suggest the Synopsys team evaluate the errors and if they believe the schema is in error they should open up an issue against the spec.

Hope that helps.

strashila commented 1 year ago

@goneall I have validated a number of SPDX SBOMs in Json format, produced by Black Duck in Newtonsoft validator, using the exact JSON schema in the spec you've referenced, and they all were validated successfully. So it does not look like the Json schema produced by BlackDuck is corrupt.

Edit - I can provide a demo SPDX SBOM from a test instance if it helps

goneall commented 1 year ago

@strashila Thanks for doing the research on this. If you could attach an SBOM that shows the validation error, I can look into it a bit further.

strashila commented 1 year ago

@goneall I'm attaching a few test SBOMs, which all validate against the schema. That said, the SBOMs also now validate against the tool https://tools.spdx.org/app/validate/ , which was not the case before. I was able to recreate the issue with BlackDuck SBOM not validating successfully with the tool when File Type is JSON a week ago, but can't do it now for some reason, and now the validation seems to be working.

strashila_python_cicd_tests-buildless-1.16_SPDX_22_2022-11-21_085803.zip alex-mvn-tests-3.0_SPDX_22_2022-11-21_090417.zip

goneall commented 1 year ago

That said, the SBOMs also now validate against the tool https://tools.spdx.org/app/validate/ , which was not the case before.

@strashila The online tools were recently updated, so the previous validation errors may have been an issue with the online tools which has been fixed.

strashila commented 1 year ago

@goneall thank you for checking. @stephenkilbaneadi this issue may have been resolved, I successfully verified a number of BlackDuck SPDX json files as JSON File Type with the online tool

goneall commented 1 year ago

@strashila @stephenkilbaneadi - shall we close this issue?

strashila commented 1 year ago

@goneall Ok with me

goneall commented 1 year ago

Closing per the above comments