oguimbal / pg-mem

An in memory postgres DB instance for your unit tests
MIT License
2.02k stars 99 forks source link

Error when using window function #198

Open matt-sanders opened 2 years ago

matt-sanders commented 2 years ago

Describe the bug

I'm trying to write a query using a window function, but receive an error Aggregate functions are not allowed in WHERE.

aggregate functions are not allowed in WHERE

🐜 This seems to be an execution error, which means that your request syntax seems okay,
but the resulting statement cannot be executed → Probably not a pg-mem error.

*️⃣ Reconsituted failed SQL statement: SELECT (sum ((count (*) )) over ( ORDER BY user_created_at ) ) AS count , user_created_at AS year  FROM users   WHERE (user_created_at >= ('2020-01-01 00:00:00.000+00')) GROUP BY user_created_at  ORDER BY year

This seems to work fine outside of pg-mem.

For context, I'm want to receive a cumulative total of users created, grouped by their creation date. I've simplified the query greatly for the sake of this bug, but ultimately want to end up with something like:

year total
2020 10
2021 20
2022 30

Where there were 10 users added each year. In my actual query I'm using date_part to get the year, but this isn't supported out of the box with pg-mem so I've left it off the example. I was able to add that function in using registerFunction but wanted to have something that you could reproduce in the playground, and for the sake of this bug, date_part isn't relevant.

To Reproduce


create table users ( "user_id" text, "user_created_at" timestamptz default current_timestamp );
insert into users (user_id) values ('first'), ('second'), ('third');

SELECT
    SUM(COUNT(*)) OVER(ORDER BY user_created_at) as count, 
    user_created_at
              FROM users
              WHERE user_created_at >= '2020-01-01 00:00:00.000+00'
              GROUP BY user_created_at
              ORDER BY user_created_at

pg-mem version

2.3.3

oguimbal commented 2 years ago

Hi !

Thanks for the report (and sorry for the late reply).

"OVER" keyword is not supported yet in pg-mem. It will come, but I cant give an ETA yet :/

That said, correct me if I'm wrong, but it seems that your statement above doesnt need an "OVER" clause, it seems equivalent to:

SELECT
    COUNT(*) as count,
    user_created_at
              FROM test
              WHERE user_created_at >= '2020-01-01 00:00:00.000+00'
              GROUP BY user_created_at
              ORDER BY user_created_at

Or do you have simplified a more complex request ?

Anyway, I've improved the corresponding error message in the release to come so others wont be confused :)

matt-sanders commented 2 years ago

No worries, thanks for the update all the same. For the time being I've just stubbed out the query in my tests.

I have simplified a more complex query but even with the simplified example the difference is subtle but distinct. Using OVER in the way above will allow me to get the cumulative total. E.g. given the following table ( simplified for brevity ): user_id user_created_at
id-1 2020-01-01
id-2 2020-01-01
id-3 2020-01-02
id-4 2020-02-01
Without OVER, I would get the following: count user_created_at
2020-01-01 2
2020-01-02 1
2020-02-01 1

Which is the count of users created grouped by each day. E.g. 2 signed up on the 1st of Jan, 1 on the 2nd of Jan, etc

However, using count I'll get the cumulative running total: count user_created_at
2020-01-01 2
2020-01-02 3
2020-02-01 4

Which is helpful to see how the total user base has grown over time. E.g. on the 1st of Jan we had 2 total users, on the 2nd of Jan we had 3 total users, etc.

Of course I could easily do the former and sum it up myself with a function afterwards, but it's nice and easy to just query it straight from the DB.

oguimbal commented 2 years ago

thanks for the details

skeet70 commented 2 years ago

I was looking at using pg-mem for tests and while looking into another issue found this one. Unfortunately OVER is the way we handle pagination for most of our API queries, so missing this means we can't use this great project :(. Just commenting for more context that this is a pretty common pagination + total pages technique, I'll keep an eye on this issue for whenever OVER does make its way in!

luc-boussantr-lp commented 1 year ago

@oguimbal Hello, its been a while that this issue was not seen. Do you have more of an ETA regarding this feature or is it still not yet priorized ? :)

thatguyintech commented 5 months ago

+1

Also hoping the OVER can be supported at some point!

We have a query that looks like this to support leaderboard ranking calculation use cases:

SELECT ranking FROM (
  SELECT id, ROW_NUMBER() OVER (ORDER BY points DESC, "createdAt" ASC) as ranking 
  FROM campaign_spot
  WHERE "campaignId" = $1
) as ranked_spots
where id = $2;