vapor / fluent-kit

Swift ORM (queries, models, and relations) for NoSQL and SQL databases
MIT License
215 stars 115 forks source link

Grouped aggregate support #206

Open tanner0101 opened 4 years ago

tanner0101 commented 4 years ago

Support getting aggregates like count, sum, max, etc for grouped field values.

SQL would implement this using the GROUP BY keyword

let results: [(country: String, count: Int)]
results = User.query(on: db).count(by: \.$country).wait()

Resulting SQL

SELECT COUNT(id), country
FROM users
GROUP BY country;

Related to https://github.com/vapor/fluent-kit/issues/31

grosch commented 4 years ago

Remember that groupings are frequently more than one column

SELECT region, country, site, campus, building, COUNT(*)
FROM locations
WHERE active = 1
GROUP BY region, country, site, campus, building
ORDER BY region, country, site, campus, building

and that I can use multiple aggregates at once

SELECT region, country, site, campus, building, COUNT(*), SUM(employees)
FROM locations
WHERE active = 1
GROUP BY region, country, site, campus, building
ORDER BY region, country, site, campus, building
tanner0101 commented 4 years ago

Another idea that would support multiple aggregates / groupings:

let users: [User] = User.query(on: db)
    .count(\.$id).max(\.$age)
    .group(by: \.$country).group(by: \.$city)
    .all().wait()

for user in users {
    let count = user.aggregate(.count, \.$id)
    let maxAge = user.aggregate(.max, \.$age)
    let country = user.country
    let city = user.city
}
jdmcd commented 4 years ago

I like that syntax a lot!

grosch commented 4 years ago

Here's my ideal

struct MyThing {
    let count: Int
    let maxAge: Int
    let country: String
    let city: String
}

let data: [MyThing] = User.customQuery(on: db, into: MyThing.self)
                                               .field(\.$country, into: \.$country)
                                               .field(\.$city, into: \.$city)
                                               .count(\.$id, into: \.$count)
                                               .max(\.$age, into: \.$maxAge)
                                               .group(by: \.$country)
                                               .group(by: \.$city)
                                               .all()
grosch commented 4 years ago

Please keep in mind how this will play into supporting CTEs as well. For example, here's a real query that I use today:

WITH cte AS (
    SELECT o.id order_id, SUM(w.amount * pt.amount * (CASE WHEN pt.abbrev = 'ud' THEN 1.0 ELSE o.station_multiplier END)) complete
    FROM work_completed w
    INNER JOIN order_points op ON w.order_point_id = op.id
    INNER JOIN points pt ON op.point_id = pt.id
    INNER JOIN orders o ON op.order_id = o.id
    GROUP BY o.id
)
SELECT o.uuid,
    c.uuid AS client_id,
    c.name AS client_name,
    s.uuid AS station_id,
    s.name AS station_name,
    o.created,
    o.due,
    o.month,
    o.comments,
    o.name AS order_name,
    o.station_multiplier,
    o.total_points,
    NULLIF(cte.complete, 0) AS points_produced,
    NULLIF(cte.complete, 0) >= o.total_points AS completed
FROM orders o
LEFT OUTER JOIN cte ON cte.order_id = o.id
JOIN stations s ON o.station_id = s.id
JOIN clients c ON s.client_id = c.id

So whatever syntax you use has to take into account how you'll compose the cte query into the outer select that uses the CTE.

mattpolzin commented 4 years ago

there's a duality of operations on fields when grouping that I think is at the crux of making it hard to represent type-safely. Setting aside syntax for a moment, just consider that when grouping: Any field being grouped by can be operated on with non-aggregate functions (including just selecting the field value) but not aggregate functions (like count) and conversely any field not being grouped on can be operated on with any aggregate functions (including count) but cannot be selected directly.

A type safe solution would ideally limit the things that can be counted, selected (field), etc. based on that duality.

I don't have any proposed solutions yet, this is just where my brain is wanting to go.

fatto commented 2 years ago

Already implemented aggregate docs so it should be closed

0xTim commented 2 years ago

@fatto Fluent doesn't have 'real' aggregate support as described in the issue as it can't do things like GROUP BY