python-jsonschema / jsonschema

An implementation of the JSON Schema specification for Python
https://python-jsonschema.readthedocs.io
MIT License
4.52k stars 574 forks source link

When certain types of error messages are converted to strings, certain schema properties are incorrectly converted #1260

Closed MrSeccubus closed 1 month ago

MrSeccubus commented 1 month ago

When I use this code:

    sfp = open("/Users/fbreedijk/repos/cna-bot/venv/lib/python3.12/site-packages/cvelib/schemas/CVE_JSON_cnaPublishedContainer_5.1.0.json")
    schema = json.load(sfp)

    validator = Draft7Validator(schema)
    errors = sorted(validator.iter_errors(json_data["containers"]["cna"]), key=lambda e: e.message)
    if errors:
        if errors:
            errors_str = "\n".join(e.message for e in errors)
            print("###\n{}\n###\n".format(errors_str))

To validate property $.containers.cna of this file https://github.com/DIVD-NL/cna-bot/blob/main/error-cves/cve_5.1/refs/01.missing/CVE-1999-0012.json against this schema: https://github.com/RedHatProductSecurity/cvelib/blob/master/cvelib/schemas/CVE_JSON_cnaPublishedContainer_5.1.0.json

This is displayed:

<snip>
Schema validation of CVE record failed.
'references' is a required property

Failed validating 'required' in schema:
    {'$comment': 'The character . is restricted in names allowed by '
                 'patternProperties to work-around naming limitations in '
                 'some common implementations.',
     '$schema': 'http://json-schema.org/draft-07/schema#',
     'additionalProperties': False,
     'definitions': {'affected': {'description': 'List of affected '
                                                 'products.',
                                  'items': {'$ref': '#/definitions/product'},
                                  'minItems': 1,
                                  'type': 'array'},
                     'cnaTags': {'description': 'Tags provided by a CNA '
                                                'describing the CVE '
                                                'Record.',
                                 'items': {'oneOf': [{'$ref': '#/definitions/tagExtension'},
                                                     {'$id': 'https://cve.mitre.org/cve/v5_00/tags/cna/',
                                                      '$schema': 'http://json-schema.org/draft-07/schema#',
                                                      'description': 'exclusively-hosted-service: '
                                                                     'All '
                                                                     'known '
                                                                     'software '
                                                                     'and/or '
                                                                     'hardware '
                                                                     'affected '
                                                                     'by '
                                                                     'this '
                                                                     'CVE '
                                                                     'Record '
                                                                     'is '
                                                                     'known '
<snip>
                                 'items': {'additionalProperties': False,
                                           'properties': {'lang': {'$ref': '#/definitions/language',
                                                                   'description': 'The '
                                                                                  'language '
                                                                                  'used '
                                                                                  'when '
                                                                                  'describing '
                                                                                  'the '
                                                                                  'credits. '
                                                                                  'The '
                                                                                  'language '
<snip>

Somehow comment gets converted to an array of three where in the schema is it just a single string (https://github.com/RedHatProductSecurity/cvelib/blob/master/cvelib/schemas/CVE_JSON_cnaPublishedContainer_5.1.0.json#L2)

And the property "description" is converted to an array of words instead of a string (https://github.com/RedHatProductSecurity/cvelib/blob/master/cvelib/schemas/CVE_JSON_cnaPublishedContainer_5.1.0.json#L24)

This makes my output very very urgly.

Julian commented 1 month ago

There's nothing converted there, in Python (like in C) "foo" "bar" is a single implicitly concatenated string.

The behavior comes from pprint which is what we use for formatting output, but making this nicer is #243 which you're of course welcome to chime in on.

MrSeccubus commented 1 month ago

It is not just about making it "nicer" the current output is ,because of this behaviour of pprint unusable.

MrSeccubus commented 1 month ago

The other thinng that you could consider to make the output of pprint usefull is to set width=sys.maxsize as per this stackoverflow article: https://stackoverflow.com/questions/31485402/can-i-make-pprint-in-python3-not-split-strings-like-in-python2

Again this is not just a cosmetic issue, this is also a usability issue.