python-jsonschema / jsonschema

An implementation of the JSON Schema specification for Python
https://python-jsonschema.readthedocs.io
MIT License
4.59k stars 578 forks source link

Possible rounding issue causing multipleOf to fail for a valid value. #1039

Closed mayafox closed 1 year ago

mayafox commented 1 year ago

Encountered the following error when fixing the schema validation errors for a new device I was adding into netbox/devicetype-library

Failed validating 'multipleOf' in schema['properties']['weight']['items']['properties']['value']: {'minimum': 0, 'multipleOf': 0.01, 'type': 'number'}

On instance['weight'][0]['value']: 10.11

Did a bit of poking around, changing the value to 1011 <<< integer, and not a float, caused the check to pass. This seems to potentially be an issue in jsonschema,

https://github.com/python-jsonschema/jsonschema/blob/main/jsonschema/_validators.py line 181

If I am reading the code right, jsonschema is running into a rounding error when running this validation.

db = 0.01 instance = 10.11 isinstance(db, float) True quotient = instance / db quotient 1010.9999999999999 int(quotient) 1010 int(round(quotient, 2)) 1011

Any Thoughts, additional information I should collect?

mayafox commented 1 year ago

possible fix

def multipleOf(validator, dB, instance, schema):
    if not validator.is_type(instance, "number"):
        return

    if isinstance(dB, float):
         precision = len(str(dB).split('.')[1])
        quotient = instance / dB
        try:
            failed = int(round(quotient, precision)) != quotient
        except OverflowError:
            # When `instance` is large and `dB` is less than one,
            # quotient can overflow to infinity; and then casting to int
            # raises an error.
            #
            # In this case we fall back to Fraction logic, which is
            # exact and cannot overflow.  The performance is also
            # acceptable: we try the fast all-float option first, and
            # we know that fraction(dB) can have at most a few hundred
            # digits in each part.  The worst-case slowdown is therefore
            # for already-slow enormous integers or Decimals.
            failed = (Fraction(instance) / Fraction(dB)).denominator != 1
    else:
        failed = instance % dB

    if failed:
        yield ValidationError(f"{instance!r} is not a multiple of {dB}")
mayafox commented 1 year ago

Issue was resolved by using a different validation method, closing

cwegener commented 1 year ago

@mayafox I don't think that the issue is resolved by avoiding the bug. Would you be able to re-open the issue?

cwegener commented 1 year ago

It seems that the distinction between validation of floating point numbers and validation of arbitrary decimal precision is causing frequent confusion for users of jsonschema (see large number of issues returned when searching for mulitpleOf in this GH repo.

mayafox commented 1 year ago

@cwegener I hit this issue while contributing to https://github.com/netbox-community/devicetype-library/pull/1067 and the schema validation failing. Devs for that repo changed the validation method to multipleOfPrecision, and addressed the issue.

cwegener commented 1 year ago

@cwegener I hit this issue while contributing to netbox-community/devicetype-library#1067 and the schema validation failing. Devs for that repo changed the validation method to multipleOfPrecision, and addressed the issue.

Thanks for the additional info. I don't actually think that the netbox-community project even resolved the issue. They also just seem to avoid the issue by (unknowingly) disabling the data validation of the field in question.

The apparent "fix" is not a fix at all. https://github.com/netbox-community/devicetype-library/issues/1068

I've left a comment in your PR thread in the netbox-community org https://github.com/netbox-community/devicetype-library/pull/1067#issuecomment-1407991577

mayafox commented 1 year ago

@cwegener Ok, what's the plan of action then? I thought about the code snippet I proposed, and it's nowhere near the quality of what I would accept in my day job, but is potentially a starting point.

Julian commented 1 year ago

Hi there -- please have a look at some of the older issues you can find by searching multipleOf -- essentially automatically rounding isn't the right thing to do for the same reason Python doesn't try to guess what you mean by doing 0.3 / 0.1 either -- that's just the way floats work. If you want to though you can use decimal.Decimal for fixed precision operations.

danner26 commented 1 year ago

Thank you for the heads up @cwegener do you have any recommendations on how to resolve this properly then? In my searches I have not found a well defined solution to this use case, but I did find a lot of other people suffering from the same confusion

@Julian can you describe how to use this? I was checking here, and I do not see decimal defined. I could be missing it though. I had noticed someone else mention decimal.Decimal in a stackoverflow artcle but could not find an implementation example. When I change the validation type to decimal.Decimal I get:

venv/lib/python3.10/site-packages/jsonschema/validators.py:318: in is_type
    return self.TYPE_CHECKER.is_type(instance, type)
venv/lib/python3.10/site-packages/jsonschema/_types.py:119: in is_type
    raise UndefinedTypeCheck(type) from None
E   jsonschema.exceptions.UndefinedTypeCheck: Type 'decimal.Decimal' is unknown to this type checker

During handling of the above exception, another exception occurred:
tests/definitions_test.py:83: in test_definitions
    Draft4Validator(schema, resolver=resolver).validate(definition)
venv/lib/python3.10/site-packages/jsonschema/validators.py:313: in validate
    for error in self.iter_errors(*args, **kwargs):
venv/lib/python3.10/site-packages/jsonschema/validators.py:288: in iter_errors
    for error in errors:
venv/lib/python3.10/site-packages/jsonschema/_validators.py:332: in properties
    yield from validator.descend(
venv/lib/python3.10/site-packages/jsonschema/validators.py:305: in descend
    for error in self.evolve(schema=schema).iter_errors(instance):
venv/lib/python3.10/site-packages/jsonschema/validators.py:288: in iter_errors
    for error in errors:
venv/lib/python3.10/site-packages/jsonschema/_legacy_validators.py:107: in items_draft3_draft4
    yield from validator.descend(item, items, path=index)
venv/lib/python3.10/site-packages/jsonschema/validators.py:305: in descend
    for error in self.evolve(schema=schema).iter_errors(instance):
venv/lib/python3.10/site-packages/jsonschema/validators.py:288: in iter_errors
    for error in errors:
venv/lib/python3.10/site-packages/jsonschema/_validators.py:298: in ref
    yield from validator.descend(instance, resolved)
venv/lib/python3.10/site-packages/jsonschema/validators.py:305: in descend
    for error in self.evolve(schema=schema).iter_errors(instance):
venv/lib/python3.10/site-packages/jsonschema/validators.py:288: in iter_errors
    for error in errors:
venv/lib/python3.10/site-packages/jsonschema/_validators.py:332: in properties
    yield from validator.descend(
venv/lib/python3.10/site-packages/jsonschema/validators.py:305: in descend
    for error in self.evolve(schema=schema).iter_errors(instance):
venv/lib/python3.10/site-packages/jsonschema/validators.py:288: in iter_errors
    for error in errors:
venv/lib/python3.10/site-packages/jsonschema/_validators.py:321: in type
    if not any(validator.is_type(instance, type) for type in types):
venv/lib/python3.10/site-packages/jsonschema/_validators.py:321: in <genexpr>
    if not any(validator.is_type(instance, type) for type in types):
venv/lib/python3.10/site-packages/jsonschema/validators.py:320: in is_type
    raise exceptions.UnknownType(type, instance, self.schema)
E   jsonschema.exceptions.UnknownType: Unknown type 'decimal.Decimal' for validator with schema:
E       {'type': 'decimal.Decimal'}
E   
E   While checking instance:
E       10.11
Julian commented 1 year ago

Do the docs here happen to help? The example there is for number, but the example is essentially the same for changing what number means instead -- if not happy to improve them (or even better to type out an example and hope for a PR :)!)

Julian commented 1 year ago

Or I guess it sounds like you want info on how to do this with the json module, in which case you want the parse_float option to json.loads e.g. json.loads(..., parse_float=decimal.Decimal).

danner26 commented 1 year ago

@Julian yes, we are using the json module in our tests along with jsonschema. I havent worked with jsonschema before this, so thank you for the assistance

I seem to still be having the same issues here. I changed the json.loads to json.loads(schema_file.read(), parse_float=decimal.Decimal) I have also updated the definition to:

"value": {
                    "type": "number",
                    "minimum": 0,
                    "multipleOf": 0.01
                },

Using a value of 10.11 I am still getting:

10.11 is not a multiple of 0.01

Failed validating 'multipleOf' in schema['properties']['weight']['items']['properties']['value']:
    {'minimum': 0, 'multipleOf': 0.01, 'type': 'number'}

On instance['weight'][0]['value']:
    10.11

During handling of the above exception, another exception occurred:
device-types/3Com/4200G-12_Port.yml failed validation: 10.11 is not a multiple of 0.01

Failed validating 'multipleOf' in schema['properties']['weight']['items']['properties']['value']:
    {'minimum': 0, 'multipleOf': 0.01, 'type': 'number'}

On instance['weight'][0]['value']:
    10.11

Do I need to also create the custom validator as well? I am sorry if this is a simple issue that I am overlooking, I appreciate the input.

Julian commented 1 year ago

That should be it -- the exception there might make it seem you may not have saved perhaps? Or not have loaded both floats as decimals?

But e.g. here's what should be a fuller example:

import decimal
import json
schema = json.loads("""{"type": "number", "minimum": 0, "multipleOf": 0.01}""", parse_float=decimal.Decimal)
good = json.loads("10.11", parse_float=decimal.Decimal)
bad = json.loads("10.115", parse_float=decimal.Decimal)
jsonschema.validate(good, schema)  # fine

vs

jsonschema.validate(bad, schema)

->

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/julian/Development/jsonschema/jsonschema/validators.py", line 1121, in validate
    raise error
jsonschema.exceptions.ValidationError: Decimal('10.115') is not a multiple of 0.01

Failed validating 'multipleOf' in schema:
    {'minimum': 0, 'multipleOf': Decimal('0.01'), 'type': 'number'}

On instance:
    Decimal('10.115')

Decimal('10.115') is not a multiple of 0.01

Failed validating 'multipleOf' in schema:
    {'minimum': 0, 'multipleOf': Decimal('0.01'), 'type': 'number'}

On instance:
    Decimal('10.115')
danner26 commented 1 year ago

Still getting the same thing, I will have to dig into this further. Thanks for all of your help so far

danner26 commented 1 year ago

So @Julian when checking my RefResolver schema (resolver = RefResolver(f'file://{os.getcwd()}/schema/devicetype.json', schema)), the part in question is 'weight': {'type': 'array', 'items': {'$ref': 'components.json#/definitions/weight'} which evaluated to:

"weight": {
            "type": "object",
            "properties": {
                "value": {
                    "type": "number",
                    "minimum": 0,
                    "multipleOf": 0.01
                },
                "unit": {
                    "type": "string",
                    "enum": [
                        "kg",
                        "g",
                        "lb",
                        "oz"
                    ]
                }
            },
            "required": ["value", "unit"]
        }

Then it fails at Draft4Validator(schema, resolver=resolver).validate(definition) where the definition for the part in question evaluates to 'weight': [{'value': 10.11, 'unit': 'kg'}]

Any ideas what I might have wrong here? Is it because we are using Draft4Validator?

danner26 commented 1 year ago

I am also seeing these errors now:

tests/definitions_test.py:84: in test_definitions
    Draft4Validator(schema, resolver=resolver).validate(definition)
/opt/hostedtoolcache/Python/3.8.16/x64/lib/python3.8/site-packages/jsonschema/validators.py:313: in validate
    for error in self.iter_errors(*args, **kwargs):
/opt/hostedtoolcache/Python/3.8.16/x64/lib/python3.8/site-packages/jsonschema/validators.py:288: in iter_errors
    for error in errors:
/opt/hostedtoolcache/Python/3.8.16/x64/lib/python3.8/site-packages/jsonschema/_validators.py:332: in properties
    yield from validator.descend(
/opt/hostedtoolcache/Python/3.8.16/x64/lib/python3.8/site-packages/jsonschema/validators.py:305: in descend
    for error in self.evolve(schema=schema).iter_errors(instance):
/opt/hostedtoolcache/Python/3.8.16/x64/lib/python3.8/site-packages/jsonschema/validators.py:288: in iter_errors
    for error in errors:
/opt/hostedtoolcache/Python/3.8.16/x64/lib/python3.8/site-packages/jsonschema/_validators.py:202: in multipleOf
    failed = instance % dB
E   TypeError: unsupported operand type(s) for %: 'float' and 'decimal.Decimal'

Which seems to lead back to https://github.com/python-jsonschema/jsonschema/blob/65afdcec79532021864acddde8e86bc727a6d137/jsonschema/_validators.py#L202

Does this warrant a new issue?

Julian commented 1 year ago

That error means you've now properly loaded your schema with decimals, but not your instance (the thing you're validating). As I mentioned, you need to make sure both of them are decimals -- I'm not sure where you're getting your instance from though so I can't point out the error, but you need to load your instance JSON with the same parse_float option as I showed in the example.

jsenecal commented 1 year ago

That error means you've now properly loaded your schema with decimals, but not your instance (the thing you're validating). As I mentioned, you need to make sure both of them are decimals -- I'm not sure where you're getting your instance from though so I can't point out the error, but you need to load your instance JSON with the same parse_float option as I showed in the example.

@Julian So, turns out that loading the schema with parse_float doesn't help when loading references to other files. A custom handler needs to be defined. Please take a look at https://github.com/netbox-community/devicetype-library/pull/1073/files for my take on fixing this issue for us.

Maybe you could shine some light if I overengineered something there or maybe improve the docs based on our troubles :D

Julian commented 1 year ago

Ah I see, sorry, I had it reversed then, your instance was fine but not your schema -- I didn't realize you were saying you were loading the schema from behind a $ref -- what you have there looks more or less fine, you can do it slightly more easily by using store rather than handlers -- which basically is just a dictionary with schemas -- so something like RefResolver(..., store={"file://whatever/path/schema.json": json.loads(Path("whatever/path/schema.json").read_text(), parse_float=decimal.Decimal)) which just allows you to do the loading yourself (and pass that extra arg), but otherwise is essentially equivalent to what you have for your case.

And definitely the docs could use more examples here but sometime soon (maybe in the next month or so) there'll be an even easier way to do this once support for the new referencing library lands which essentially will supercede RefResolver, so lots will get easier then!

Feedback definitely still appreciated though!

danner26 commented 1 year ago

Thanks for all of your help @Julian