michaelkryukov / mongomock_motor

Library for mocking AsyncIOMotorClient built on top of mongomock.
MIT License
94 stars 23 forks source link

The aggregate functionality is not working as expected. #30

Open SamuelSmets opened 1 year ago

SamuelSmets commented 1 year ago

Hi,

So far, the use of the AsyncMongoMockClient has been great, but sadly I just ran into an issue in which the expected behaviour is not observed. It has to do with the aggregate method that you can call on a collection (and more specifically the $group functionally).

method to be tested:

async def calculate_method(id_dataset: int):
    collection = db["test"]
    cursor = collection.aggregate(
        [
            {"$match": {"id_dataset": id_dataset}},
            {
                "$group": {
                    "_id": ["$col_a", "$col_b"],
                    "result": {
                        "$sum": {
                            "$multiply": [
                                "$col_c",
                                "$col_d",
                                0.01,
                            ]
                        }
                    },
                }
            },
        ]
    )

    return await cursor.to_list(None)

Test case:

@pytest.fixture
async def mock_db(mocker):
    client_mock = AsyncMongoMockClient()
    mock_db = client_mock.db
    documents = [
        {
            "id_dataset": 1,
            "col_a": "a",
            "col_b": "s",
            "col_c": 12.5,
            "col_d": 100,
        },
        {
            "id_dataset": 1,
            "col_a": "a",
            "col_b": "s",
            "col_c": 43.2,
            "col_d": 100,
        },
        {
            "id_dataset": 1,
            "col_a": "b",
            "col_b": "t",
            "col_c": 50.0,
            "col_d": 120,
        },
        {
            "id_dataset": 1,
            "col_a": "b",
            "col_b": "t",
            "col_c": 100.0,
            "col_d": 120,
        },
        {
            "id_dataset": 2,
            "col_a": "a",
            "col_b": "z",
            "col_c": 100.0,
            "col_d": 100,
        },
    ]
    await mock_db["test"].insert_many(documents)

    # I replaced the real path here, cause it would not make sense in this case, but I mocked the db instance used in the method to be tested.
    mocker.patch("file_path_to_the_db_instance", mock_db)

    return mock_db

@pytest.mark.asyncio
async def test_calculate_method(mock_db):
    output = await calculate_method(massa_update_id=1)
    expected = [
        {
            "_id": ["a", "s"],
            "result": 55.7,
        },
        {
            "_id": ["b", "t"],
            "result": 180.0,
        },
    ]

    assert output == expected

The expected output of the method under testing is:

[
        {
            "_id": ["a", "s"],
            "result": 55.7,
        },
        {
            "_id": ["b", "t"],
            "result": 180.0,
        },
]

The test fails and says that the output should be:

[
    {'
         result': 235.7,
         '_id': ['$col_a', '$col_b']
    }
]

Nevertheless if I use the code on a real mongodb instance, the output is as I formulated in the expected variable. So the matching of "id_dataset" works and the summation and multiplication works, but the group by on columns "col_a" and "col_b" does not work.

michaelkryukov commented 1 year ago

Hi, as far as I can tell, value of "_id" in "$group" stage can be valid expression and any literal will lead to single output document. I didn't found any mentions of "expression arrays" in the Docs, only "expression objects". I suspect you have working, but undocumented example and should just use supported expression object (not list). Can you point to docs where interaction with arrays is described?

Also, I think this issue should be addressed to mongomock (as aggregations and stages are implemented there).