Open teese opened 1 year ago
You are correct about the accepted.tolerance()
behavior--it's only applied to direct child values, not to values within nested dictionaries. A workaround would be to convert nested dictionaries into a flattened dictionary with composite keys.
Here's a function that converts nested dictionaries into a flat dictionary with composite tuple keys:
def flatten(d, parent_key=()):
"""Helper function to flatten nested dictionaries."""
items = []
for k, v in d.items():
new_key = tuple(parent_key) + (k,) if parent_key else k
if isinstance(v, dict):
items.extend(flatten(v, new_key).items())
else:
items.append((new_key, v))
return dict(items)
Using the function above, you could flatten the dictionaries like so:
>>> dict1 = {"x": {"a": 0.99, "b": 2.0}, "y": 3.0}
>>> dict2 = {"x": {"a": 1.0, "b": 1.99}, "y": 2.99}
>>> flatten(dict1)
{('x', 'a'): 0.99, ('x', 'b'): 2.0, 'y': 3.0}
>>> flatten(dict2)
{('x', 'a'): 1.0, ('x', 'b'): 1.99, 'y': 2.99}
This would let you change your sample code to the following:
import pytest
from datatest import validate, accepted, ValidationError
def flatten(d, parent_key=()):
"""Helper function to flatten nested dictionaries."""
items = []
for k, v in d.items():
new_key = tuple(parent_key) + (k,) if parent_key else k
if isinstance(v, dict):
items.extend(flatten(v, new_key).items())
else:
items.append((new_key, v))
return dict(items)
def test_datatest():
dict1 = {"x": {"a": 0.991, "b": 2.0}, "y": 3.0}
dict2 = {"x": {"a": 1.0, "b": 1.991}, "y": 2.991}
with accepted.tolerance(0.01):
validate(flatten(dict1), flatten(dict2)) # <- Flattened for validation.
# NOTE: I changed the `.99`s in this sample code because
# the floating point math was giving me a difference of
# `0.010000000000000009` (outside the accepted tolerance).
I like the idea of validating nested dictionary values directly but the implementation gets more complex that it might initially seem. Since ValidationError differences reflect the structure of the tested data, nested dictionaries would mean nested difference handling. At this time, the internal acceptance machinery is not set-up to handle this sort of thing and in combination with accepted.count()
it would have resulted in non-deterministic behavior when running on older versions of Python. This is because it was written to support versions of Python that didn't guarantee dictionaries with stable order.
That said, future versions of datatest will drop support for those old versions of Python and direct validation of nested values should be possible. But that's not something I can add in the short term. For now, the dictionaries will need to be flattened for validation.
This is a good question though and I should definitely add a page to the How-to Guide that addresses this use case.
A different flatten()
function could combine the keys into a single string value. Doing this is less precise than the tuple-keys version shown previously but many use cases don't need to preserve the keys exactly and the result can be more readable:
def flatten(d, parent_key="", sep="."):
"""Helper function to flatten nested dictionaries."""
items = []
for k, v in d.items():
new_key = f"{parent_key}{sep}{k}" if parent_key else k
if isinstance(v, dict):
items.extend(flatten(v, new_key, sep=sep).items())
else:
items.append((new_key, v))
return dict(items)
This function would give more compact keys:
>>> dict1 = {"x": {"a": 0.99, "b": 2.0}, "y": 3.0}
>>> dict2 = {"x": {"a": 1.0, "b": 1.99}, "y": 2.99}
>>> flatten(dict1)
{'x.a': 0.99, 'x.b': 2.0, 'y': 3.0}
>>> flatten(dict2)
{'x.a': 1.0, 'x.b': 1.99, 'y': 2.99}
Thanks @shawnbrown for the excellent, fast response. The workaround with flatten() that combined the keys into a single string was perfect for my use-case. It's no problem if you want to close this issue, preferably after updating the documentation :).
I'm having trouble adjusting the tolerance when comparing nested dictionaries. For example, the following pytest is green:
The ValidationError caught when evaluating
validate(dict1, dict2)
suggests that the accepted.tolerance is only applied when evaluating the values in a dictionary, but not the values of nested dictionaries. The exception caught by pytest is<ExceptionInfo ValidationError({'x': Invalid({'a': 0.99, 'b': 2.0}, expected={'a': 1.0, 'b': 1.99})}, 'does not satisfy mapping requirements') tblen=2>
. Is there a workaround, to allow the validation of nested dictionary values with a tolerance?