spdx / ntia-conformance-checker

Check SPDX SBOM for NTIA minimum elements
Apache License 2.0
55 stars 20 forks source link

Online validator chokes on JSON #194

Closed dlegaultbbry closed 3 months ago

dlegaultbbry commented 4 months ago

Tried to use this validator with a JSON file (selected JSON from drop down) and the validator errors out saying invalid file format.

https://tools.spdx.org/app/ntia_checker/

The file was also checked in https://tools.spdx.org/app/validator/ and it is all good.

The following warning(s) were raised:

Is this SBOM NTIA minimum element conformant? False

The provided document couldn't be parsed, check for ntia minimum elements couldn't be performed.

The following SPDXParsingError was raised:

Unsupported SPDX file type: /spdx/src/app/media/AnonymousUser/1721311798/spdx-doc-29facc8d-9168-449d-9722-dbbd518a7cb9
jspeed-meyers commented 4 months ago

@dlegaultbbry, can you please provide the SBOM in question? Thank you for the bug report.

dlegaultbbry commented 4 months ago

spdx_json.zip

jspeed-meyers commented 4 months ago

Thank you! I'll investigate and report back. Anybody else--feel free to investigate too.

jspeed-meyers commented 4 months ago

@dlegaultbbry: I believe I replicated the error and I have a proposed fix.

First, tool version:

ntia-checker --version                                        
2.0.0

Second, I run the tool on the file you provided (assuming the file does NOT end in .json) and get the same error:

ntia-checker --file tests/data/other_tests/spdx-doc-fixme

Is this SBOM NTIA minimum element conformant? False

The provided document couldn't be parsed, check for ntia minimum elements couldn't be performed.

The following SPDXParsingError was raised:

Unsupported SPDX file type: tests/data/other_tests/spdx-doc-fixme

Third, I run the tool on the same file but with .json added to the end of the file and do not get an error.

ntia-checker --file tests/data/other_tests/spdx-doc-fixme.json 

Is this SBOM NTIA minimum element conformant? False

Individual elements                            | Status
-------------------------------------------------------
All component names provided?                  | True
All component versions provided?               | True
All component identifiers provided?            | True
All component suppliers provided?              | False
SBOM author name provided?                     | True
SBOM creation timestamp provided?              | True
Dependency relationships provided?             | True

No error.

To repeat: Adding .json to the end of the filename appears to fix the error.

The underlying library that ntia-conformance-checker uses for SBOM operations (https://github.com/spdx/tools-python) uses heuristics related to file ending for parsing. See the function below, which comes from this file.

def file_name_to_format(file_name: str) -> FileFormat:
    if file_name.endswith(".rdf") or file_name.endswith(".rdf.xml"):
        return FileFormat.RDF_XML
    elif file_name.endswith(".tag") or file_name.endswith(".spdx"):
        return FileFormat.TAG_VALUE
    elif file_name.endswith(".json"):
        return FileFormat.JSON
    elif file_name.endswith(".xml"):
        return FileFormat.XML
    elif file_name.endswith(".yaml") or file_name.endswith(".yml"):
        return FileFormat.YAML
    else:
        raise SPDXParsingError(["Unsupported SPDX file type: " + str(file_name)])

In short, adding .json should do the trick!

dlegaultbbry commented 4 months ago

Agreed, but then the UI dropdown to select the document type is redundant if the tool relies on the extension?

jspeed-meyers commented 4 months ago

@dlegaultbbry: Yeah, that's a good point. I wonder if @goneall has more context and explanation. @goneall is a maintainer of that UI and integrated ntia-conformance-checker into the UI.

goneall commented 4 months ago

The UI for the online tool conformance checker is basically the same as the UI for validate.

We could remove the type chooser if that helps. Another possibility would be to change the API to the conformance checker to allow the passing in of the file type. If no file type is provided, the same heuristic could be used.

jspeed-meyers commented 3 months ago

@dlegaultbbry: It seems like there a couple of options, none, if I understand correctly, that directly relate to ntia-conformance-checker. One option is to alter the underlying Python SPDX library to be able to explicitly set the type of file ingested rather than using a heuristic. Another option is to alter the UI code in the website UI codebase. Thoughts? I'll close this issue in a week, allowing some more discussion first.

And if I'm missing something, please, anyone, say something.

jspeed-meyers commented 3 months ago

I am going to close this issue, BUT BUT if anyone wants to re-open it or to open a new, similar issue, please be my guest. I can understand @dlegaultbbry's feature request--I just don't see it as related to this project, i.e. ntia-conformance-checker.