shawnbrown / datatest

Tools for test driven data-wrangling and data validation.
Other
294 stars 13 forks source link

AcceptedExtra not working as expected with dicts #47

Closed TheOtherDude closed 5 years ago

TheOtherDude commented 5 years ago

I expected with AcceptedExtra(): to ignore missing keys in dicts, but instead it raises a Deviation from None.

Here is an example:

actual = {'a': 1, 'b': 2}
expected = {'b': 2}
with AcceptedExtra():
    validate(actual, requirement=expected)

The output is:

E           ValidationError: does not satisfy mapping requirements (1 difference): {
                'a': Deviation(+1, None),
            }

Thanks for the cool package, by the way!

shawnbrown commented 5 years ago

This is working as intended (under the current API) but your expectation is entirely reasonable and I think datatest should be changed to use an Extra difference in this case. For legacy reasons, it currently uses a Deviation when numbers are checked against no value.

I can resolve this in the upcoming 0.9.6 release when I clean up the acceptance API.

Below, I'll walk through your example to explain what's going on and why it was intentional—those legacy reasons. Then I'll describe the fix I have in mind.

Here's a version of your code that doesn't include an acceptance:

from datatest import validate

actual = {'a': 1, 'b': 2}
expected = {'b': 2}
validate(actual, expected)

As you noted, this fails with:

ValidationError: does not satisfy mapping requirements (1 difference): {
    'a': Deviation(+1, None),
}

Because the difference type is a Deviation, using accepted.extra() doesn't catch it. The difference can, however, be accepted with the following:

from datatest import validate, accepted

actual = {'a': 1, 'b': 2}
expected = {'b': 2}
with accepted.deviation(1):
    validate(actual, expected)

But look at this modified version of your example (using strings instead of integers):

from datatest import validate

actual = {'a': 'x', 'b': 'y'}
expected = {'b': 'y'}
validate(actual, expected)

Now this fails with that Extra difference you were expecting:

ValidationError: does not satisfy mapping requirements (1 difference): {
    'a': Extra('x'),
}

Here is what's going on: The type of difference used is determined by the type of object being checked. And with mappings, there can be situations (as in your example) where values don't have anything to be compared against—a sort of null condition.

Datatest tries to generate sensable differences for these value-vs-no value situations. And when I designed the behavior, I was trying to make it interoperate with the acceptances as smoothly as possible. But because the acceptances weren't as flexible as they are now, I had to make some tough choices. I decided on the following:

I really hated that last one but I was the least awful thing to do at the time. This scheme helped the workflow when adding acceptances. But there's no reason to keep doing this and it's not the behavior people expect to see.

The fix I have in mind

I think I can unify the no value comparison behavior as follows:

And then replace the current accepted.deviation() with a more capable accepted.tolerance() that can handle numeric differences regardless of type (not just Deviation objects).

This would make it so your original example would fail with:

ValidationError: does not satisfy mapping requirements (1 difference): {
    'a': Extra(1),
}

Which could then be accepted with...

from datatest import validate, accepted

actual = {'a': 1, 'b': 2}
expected = {'b': 2}
with accepted(Extra):  # <- new format in upcoming API
    validate(actual, expected)

...or with...

from datatest import validate, accepted

actual = {'a': 1, 'b': 2}
expected = {'b': 2}
with accepted.tolerance(1):  # Accepts +/- 1
    validate(actual, expected)

I'll follow up when I start to address this.

TheOtherDude commented 5 years ago

That makes sense. Thanks!

shawnbrown commented 5 years ago

This has been resolved in the development version for a few weeks but I just published a new stable release (0.9.6) so the fix is available from PyPI now, too.

Thanks @TheOtherDude for filing the issue.