✨ Add support for sorting data in `insert_assert` based on previous data (e.g. from a previous run) to minimize the diff

✨ Add support for sorting data in insert_assert based on previous data (e.g. from a previous run) to minimize the diff.

Motivation

Of of the main use cases for me, and where insert_assert shines the most (for me) is updating the assert for a big OpenAPI output from FastAPI (in FastAPI tests and SQLModel tests).

Nevertheless, as the previous tests used Pydantic 1.x, the output generated by Pydantic v2 has some slight changes.

But, Pydantic v2 outputs some keys in JSON Schema in different order than v1... which is fine, because dicts are not ordered, equality is the same, tests would still pass, etc. ...but the resulting diff from the previous data and the new inserted data is quite big, just for these differences (e.g. title now comes before the rest). And that makes it more difficult to see the actual changes (e.g. values with str | None now have a schema of "any between string and null").

For the FastAPI tests, during the migration to Pydantic v2, I manually updated all those differences one by one to check the actual content change.

Now I'm updating SQLModel and having a local version of this helps a lot, the git diff shows only what actually changed, and I can verify and update anything necessary much more quickly.

Problem Example

Imagine you have a test that looks like:

def test_dict(insert_assert):
    data = get_data()
    # insert_assert(data)
    assert data == {
        "foo": 1,
        "bar": [
            {"name": "Pydantic", "tags": ["validation", "json"]},
            {"name": "FastAPI", "description": "Web API framework in Python"},
            {"name": "SQLModel"},
        ],
        "baz": 3,
    }

But now get_data() was updated and returns:

{
    "bar": [
        {
            "description": "Data validation library",
            "tags": ["validation", "json"],
            "name": "Pydantic",
        },
        {"name": "FastAPI", "description": "Web API framework in Python"},
        {"description": "DBs and Python", "name": "SQLModel"},
        {"name": "ARQ"},
    ],
    "baz": 6,
    "foo": 12,
}

If you just run insert_assert as before:

def test_dict(insert_assert):
    data = get_data()
    insert_assert(data)

You would normally get this:

def test_dict(insert_assert):
    data = get_data()
    # insert_assert(data)
    assert data == {
        "bar": [
            {
                "description": "Data validation library",
                "tags": ["validation", "json"],
                "name": "Pydantic",
            },
            {"name": "FastAPI", "description": "Web API framework in Python"},
            {"description": "DBs and Python", "name": "SQLModel"},
            {"name": "ARQ"},
        ],
        "baz": 6,
        "foo": 12,
    }

This has a larger diff, although the differences are not that big:

def test_dict(insert_assert):
    data = get_data()
    # insert_assert(data)
    assert data == {
-        "foo": 1,
        "bar": [
-            {"name": "Pydantic", "tags": ["validation", "json"]},
+            {"description": "Data validation library", "tags": ["validation", "json"], "name": "Pydantic"},
-            {"name": "FastAPI", "description": "Web API framework in Python"},
+           {"description": "Web API framework in Python", "name": "FastAPI"},
-            {"name": "SQLModel"},
+           {"description": "DBs and Python", "name": "SQLModel"},
+           {"name": "ARQ"},
        ],
-        "baz": 3,
+       "baz": 6,
+      "foo": 1,
    }

Solution

Now let's start with the same original example:

def test_dict(insert_assert):
    data = get_data()
    # insert_assert(data)
    assert data == {
        "foo": 1,
        "bar": [
            {"name": "Pydantic", "tags": ["validation", "json"]},
            {"name": "FastAPI", "description": "Web API framework in Python"},
            {"name": "SQLModel"},
        ],
        "baz": 3,
    }

When updating it to run insert_assert again, you can pass as the second argument the old data:

def test_dict(insert_assert):
    data = get_data()
    insert_assert(data, {
        "foo": 1,
        "bar": [
            {"name": "Pydantic", "tags": ["validation", "json"]},
            {"name": "FastAPI", "description": "Web API framework in Python"},
            {"name": "SQLModel"},
        ],
        "baz": 3,
    })

And now when you run it, it will have the same new data, but with the keys in the new dicts sorted based on the order of the older data, minimizing the git diff:

def test_dict(insert_assert):
    data = get_data()
    insert_assert(data, {
        "foo": 1,
        "bar": [
-            {"name": "Pydantic", "tags": ["validation", "json"]},
+            {"name": "Pydantic", "tags": ["validation", "json"], "description": "Data validation library"},
            {"name": "FastAPI", "description": "Web API framework in Python"},
-            {"name": "SQLModel"},
+           {"name": "SQLModel", "description": "DBs and Python"},
+           {"name": "ARQ"},
        ],
-       "baz": 3,
+      "baz": 6,
    })

Notice, for example, how "foo" was kept at the top of the dict, so there's no diff for "foo" now (which didn't change).

And the dict for FastAPI doesn't have diff changes.

Codecov Report

Merging #148 (d32dd60) into main (ec406ff) will decrease coverage by 0.02%. The diff coverage is 96.00%.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #148 +/- ## ========================================== - Coverage 96.29% 96.27% -0.02% ========================================== Files 8 8 Lines 729 752 +23 Branches 111 120 +9 ========================================== + Hits 702 724 +22 Misses 21 21 - Partials 6 7 +1 ``` | [Files](https://app.codecov.io/gh/samuelcolvin/python-devtools/pull/148?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Samuel+Colvin) | Coverage Δ | | |---|---|---| | [devtools/pytest\_plugin.py](https://app.codecov.io/gh/samuelcolvin/python-devtools/pull/148?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Samuel+Colvin#diff-ZGV2dG9vbHMvcHl0ZXN0X3BsdWdpbi5weQ==) | `89.00% <96.00%> (+0.86%)` | :arrow_up: | ------ [Continue to review full report in Codecov by Sentry](https://app.codecov.io/gh/samuelcolvin/python-devtools/pull/148?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Samuel+Colvin). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Samuel+Colvin) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://app.codecov.io/gh/samuelcolvin/python-devtools/pull/148?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Samuel+Colvin). Last update [ec406ff...d32dd60](https://app.codecov.io/gh/samuelcolvin/python-devtools/pull/148?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Samuel+Colvin). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Samuel+Colvin).

samuelcolvin / python-devtools