accepted.tolerance() not applied when comparing values in nested dictionary

teese commented 1 year ago

I'm having trouble adjusting the tolerance when comparing nested dictionaries. For example, the following pytest is green:

import pytest
from datatest import validate, accepted, ValidationError

def test_datatest():
    dict1 = {"x": {"a": 0.99, "b": 2.0}, "y": 3.0}
    dict2 = {"x": {"a": 1.0, "b": 1.99}, "y": 2.99}

    # validation of full dictionary is unsuccessful, despite a difference of 0.01 in all cases (test red)
    # tolerance is not applied when evaluating values in the nested dictionary with key 'x'
    with pytest.raises(ValidationError) as e:
        with accepted.tolerance(0.01):
            validate(dict1, dict2)

    exception_string = str(e)
    print(exception_string)
    assert "'x'" in exception_string
    assert "'y'" not in exception_string

    # validation of the inner dictionary is successful (test green)
    with accepted.tolerance(0.1):
        validate(dict1["x"], dict2["x"])

The ValidationError caught when evaluating validate(dict1, dict2) suggests that the accepted.tolerance is only applied when evaluating the values in a dictionary, but not the values of nested dictionaries. The exception caught by pytest is <ExceptionInfo ValidationError({'x': Invalid({'a': 0.99, 'b': 2.0}, expected={'a': 1.0, 'b': 1.99})}, 'does not satisfy mapping requirements') tblen=2>. Is there a workaround, to allow the validation of nested dictionary values with a tolerance?

shawnbrown commented 1 year ago

You are correct about the accepted.tolerance() behavior--it's only applied to direct child values, not to values within nested dictionaries. A workaround would be to convert nested dictionaries into a flattened dictionary with composite keys.

Here's a function that converts nested dictionaries into a flat dictionary with composite tuple keys:

def flatten(d, parent_key=()):
    """Helper function to flatten nested dictionaries."""
    items = []
    for k, v in d.items():
        new_key = tuple(parent_key) + (k,) if parent_key else k
        if isinstance(v, dict):
            items.extend(flatten(v, new_key).items())
        else:
            items.append((new_key, v))
    return dict(items)

Using the function above, you could flatten the dictionaries like so:

>>> dict1 = {"x": {"a": 0.99, "b": 2.0}, "y": 3.0}
>>> dict2 = {"x": {"a": 1.0, "b": 1.99}, "y": 2.99}
>>> flatten(dict1)
{('x', 'a'): 0.99, ('x', 'b'): 2.0, 'y': 3.0}
>>> flatten(dict2)
{('x', 'a'): 1.0, ('x', 'b'): 1.99, 'y': 2.99}

This would let you change your sample code to the following:

import pytest
from datatest import validate, accepted, ValidationError

def flatten(d, parent_key=()):
    """Helper function to flatten nested dictionaries."""
    items = []
    for k, v in d.items():
        new_key = tuple(parent_key) + (k,) if parent_key else k
        if isinstance(v, dict):
            items.extend(flatten(v, new_key).items())
        else:
            items.append((new_key, v))
    return dict(items)

def test_datatest():
    dict1 = {"x": {"a": 0.991, "b": 2.0}, "y": 3.0}
    dict2 = {"x": {"a": 1.0, "b": 1.991}, "y": 2.991}

    with accepted.tolerance(0.01):
        validate(flatten(dict1), flatten(dict2))  # <- Flattened for validation.

# NOTE: I changed the `.99`s in this sample code because
# the floating point math was giving me a difference of
# `0.010000000000000009` (outside the accepted tolerance).

I like the idea of validating nested dictionary values directly but the implementation gets more complex that it might initially seem. Since ValidationError differences reflect the structure of the tested data, nested dictionaries would mean nested difference handling. At this time, the internal acceptance machinery is not set-up to handle this sort of thing and in combination with accepted.count() it would have resulted in non-deterministic behavior when running on older versions of Python. This is because it was written to support versions of Python that didn't guarantee dictionaries with stable order.

That said, future versions of datatest will drop support for those old versions of Python and direct validation of nested values should be possible. But that's not something I can add in the short term. For now, the dictionaries will need to be flattened for validation.

This is a good question though and I should definitely add a page to the How-to Guide that addresses this use case.

shawnbrown commented 1 year ago

A different flatten() function could combine the keys into a single string value. Doing this is less precise than the tuple-keys version shown previously but many use cases don't need to preserve the keys exactly and the result can be more readable:

def flatten(d, parent_key="", sep="."):
    """Helper function to flatten nested dictionaries."""
    items = []
    for k, v in d.items():
        new_key = f"{parent_key}{sep}{k}" if parent_key else k
        if isinstance(v, dict):
            items.extend(flatten(v, new_key, sep=sep).items())
        else:
            items.append((new_key, v))
    return dict(items)

This function would give more compact keys:

>>> dict1 = {"x": {"a": 0.99, "b": 2.0}, "y": 3.0}
>>> dict2 = {"x": {"a": 1.0, "b": 1.99}, "y": 2.99}
>>> flatten(dict1)
{'x.a': 0.99, 'x.b': 2.0, 'y': 3.0}
>>> flatten(dict2)
{'x.a': 1.0, 'x.b': 1.99, 'y': 2.99}

teese commented 1 year ago

Thanks @shawnbrown for the excellent, fast response. The workaround with flatten() that combined the keys into a single string was perfect for my use-case. It's no problem if you want to close this issue, preferably after updating the documentation :).

shawnbrown / datatest

accepted.tolerance() not applied when comparing values in nested dictionary #62