Closed hesselink closed 8 years ago
What does it even mean to have each aggregator work with a different ordering? Does it scan the table multiple times?
I think so, it has to right? I haven't looked at any plans though, just read the documentation.
It's times like these I just want to give up and write my own relational database engine.
What's the problem with orderAggregate :: Order c -> Aggregator c d -> Aggregator c d
? I don't see why it would apply to all the columns in the aggregator.
Say the aggregator is p2 (arrayAgg, arrayAgg) :: Aggregator (Column PGInt4, Column PGInt4) (Column (PGArray PGInt4), (PGArray PGInt4))
, for example. If I now apply asc fst
to this, which of the aggregations would it order? The only sensible thing to do is both, right?
Oh, I completely misunderstood. In my head you were proposing something like
orderArrayAgg :: Order a -> (a -> Column b) -> Aggregator a (Column b)
Would that work?
[EDIT: garbled the first one]
What kind of aggregation would that create?
orderArrayAgg o f
would aggregate the Column
picked out by f
into an array whose order is given by o
.
Hmm, ok, so that would mean that something like orderArrayAgg (asc fst) fst :: Aggregator (Column PGInt4, Column PGInt4) (Column (PGArray PGInt4), (PGArray PGInt4))
would do an array aggregate ordered by column 1 on column 1. But what would it do on column two? It says it's an Aggregator
for both columns, so does the second one also do an array aggregate? What if I want a different aggregate there? Or a different order?
I don't understand your type. The type should be
orderArrayAgg (asc fst) fst :: Aggregator (Column PGInt4, Column PGInt4) (Column (PGArray PGInt4))
I got my original type wrong. Perhaps that confused you. It should be
orderArrayAgg :: Order a -> (a -> Column b) -> Aggregator a (Column (PGArray b))
(And to answer your question: It doesn't do anything with the second column. The second column is ignored.)
Ah, I see, then I misunderstood. So it does a combination of aggregation and projection. That's very different to what Opaleye does now, if I understand correctly. Now, if you have a tuple of columns, when you aggregate you'll always get a tuple of aggregated columns back. With this function, you'll aggregate and you'll get a single column. How would you combine these then to get aggregates of both columns? I see that Aggregate
has an Applicative
instance, will that do the right thing?
That's very different to what Opaleye does now
No, not at all. It's exactly the same. lmap
does projection before the aggregation takes place.
I see that
Aggregate
has anApplicative
instance, will that do the right thing?
Indeed.
Ah yes, I don't think of Profunctor
as something I can actually use :) I'll try to implement the function with your signature and see where I end up. It does mean duplicating all aggregators where ordering makes a difference, although currently there are only two, so that's not too bad.
With an aggregator of this form it may be difficult to use p2
, p3
, pWhatever
, etc.. It may turn out easier to use the pattern presented in https://github.com/tomjaguarpaw/haskell-opaleye/blob/master/Doc/Tutorial/TutorialBasicMonomorphic.lhs in the definition of aggregateWidgets
.
I don't think of
Profunctor
as something I can actually use :)
:)
It does mean duplicating all aggregators where ordering makes a difference
Yeah, that's a bit annoying.
Ah yes, that example is great, and looks exactly like what I imagined when I thought of using the Applicative
instance.
I realized that using lmap
and Applicative
, I can actually apply different orderings to different parts of the aggregation using my initial code! You do the aggregation on a single column, lmap
to the larger type including the column you want to order on, then order the aggregation. Finally you combine many of these together using the Applicative
combinators.
So I think I'll keep my signature since it's more compositional and doesn't require duplicating the aggregation function. If you agree, I can send a PR against opaleye master.
What do you mean "then order the aggregation"? I thought this was exacly what we determined didn't work:
it is "usually" enough to just aggregate an ordered subquery to get the result you want. However, I just ran into a case where it isn't
I'm using my original orderAggregate :: Order a -> Aggregate a b -> Aggregate a b
, but with the code mentioned above, I'm able to apply a different ordering to each aggregate (which I thought I couldn't before). So you get something like:
x :: Aggregator (Column a, Column b) (Column (PGArray a), Column (PGArray a))
x = (,) <$> orderAggregate (asc snd) (lmap fst arrayAggGrouped)
<*> orderAggregate (desc snd) (lmap fst arrayAggGrouped)
This would generate code like:
SELECT array_agg(a ORDER BY b ASC), array_agg(a ORDER BY b DESC)
FROM (SELECT a, b FROM ...)
I see. Can we convince ourselves that this always leads to syntactically valid SQL? For example, what does it mean to apply your orderAggregate
to something that contains two separate aggregations?
What do you mean "two separate aggregations"? If you mean this, it works and applies the order to all aggregations in the Aggregator
:
x :: Aggregator (Column a, Column b) (Column (PGArray a), Column (PGArray b))
x = orderAggregate (asc snd) $ p2 (arrayAggGrouped, arrayAggGrouped)
This would generate code like:
SELECT array_agg(a ORDER BY b ASC), array_agg(b ORDER BY b ASC)
FROM (SELECT a, b FROM ...)
As for producing syntactically (and semantically) value SQL: applying it repeatedly will overwrite the order, just like orderBy
will. And all columns should be in scope as enforced by the first type argument of Aggregator
. So there aren't any issues that I can see.
I see. I must say that I am fairly uncomfortable with overwriting order. It means that I can't write an abstract Aggregator
whose implementation is hidden. Instead, users can poke around inside it by applying different orderAggregate
s from the outside.
The same is true of Query
, right? I see that as an advantage: it's composable.
No, aggregation on a Query
doesn't affect anything on the inside.
... I mean ordering on a Query
...
I mean you can apply ordering to a Query
after the fact and change its behaviour.
What do you mean with 'on this inside'? On the inside of what?
Can we take a step back? Can you explain your use case for two different orders in the same aggregation?
I don't have one. I just implemented a way to allow specifying an order for an aggregation. I did this as a combinator, and it turns out that you can use it to specify multiple different orders for the different aggregations in a single query. Since this matches the power that postgres gives you, I thought that was a very nice result: a simple combinator, and all the power of the underlying SQL. I'm now offering to also apply this code to opaleye master, but if you disagree with this approach I'm happy to keep it on our branch only, since it does what I need and the implementation is clean.
Ah yes, of course. Without it we don't have a way of forcing the aggregation order at all.
Keep it on your branch for now since I'm a bit uneasy about how Postgres handles this, but I'll think about it further.
By the way, I would be perfectly happy to merge
orderAggregate :: Order a -> Aggregator a b -> Query a -> Query b
if you wanted a compromise.
That wouldn't allow multiple orders though.
Correct, which is why I'm more comfortable merging it :)
It seems like that would satisfy me whilst meeting all your current use cases. If you really need multiple orders in the future then we can reopen the discussion.
I still don't understand your objection to multiple orders. What's the difference in safety of opaleye between generating array_agg(a ORDER BY b ASC), array_agg(a ORDER BY b ASC)
and array_agg(a ORDER BY b ASC), array_agg(a ORDER BY b DESC)
?
I'm not objecting to safety at this point (I wouldn't be surprised if it were safe, or if it weren't safe). However, the semantics are a bit odd, and I'd rather avoid thinking about them.
It all feels a bit hand-wavy to me. I'll hold off on sending the PR then, since changing the code for no reason apparent to me feels like wasted effort, since I believe the current version is strictly better (and I trust postgres that if they document this, it will work sanely).
I am trying to be incredibly cautious with the denotational semantics of Opaleye. Being able to "order" an Aggregator
complicates its meaning considerably.
All right, it seems our goals are a bit different then: I'm mainly using it as a convenient, type checked SQL DSL. Since the (postgres) SQL allows ordering aggregates, it seems perfectly obvious to me to support this in opaleye as well. Be sure to let me know when/if you've worked out the semantics, and I can send the PR. Or you can just pick off the stuff from our branch.
I'm mainly using it as a convenient, type checked SQL DSL
Well yes, but it's a question of what you expect your types to mean :)
Anyway, here's a second possible compromise: We put your function Order a -> Aggregate a b -> Aggregate a b
in Opaleye.Internal.Aggregate
and we put my, less general, function Order a -> Aggregator a b -> Query a -> Query b
in Opaleye.Aggregate
. Then at least the functionality you have written is in the repo. Your branch can just reexport your version from Opaleye.Aggregate
. I promise not to touch your version without telling you :)
How does that sound? (Of course I could just do this unilaterally, but if you think it's a good idea I'm likely to do it sooner.)
That sounds very reasonable. Note that I don't need to do anything on my branch, since it's based on 0.3, not current master.
OK, sounds like a good plan then. What should I do? Copy your code across from your branch by hand?
Sure, or you can cherry-pick the commit. Be sure to take the latest one since I force updated it. There's a decent chance it will apply cleanly.
Or I can do it in a couple of hours when I get home.
I don't think I'll do it in the next couple of hours, but whichever of us gets around to it first can do it.
OK, I'm working on this now.
I'm trying to tackle ordering within aggregations. This is relevant for some aggregations like
arrayAgg
andstringAgg
. The postgres documentation states that it is "usually" enough to just aggregate an ordered subquery to get the result you want. However, I just ran into a case where it isn't, and it depends on the data, not the query. Additionally, there's no clear reason visible to me: adding one record messes up the ordering of another. So I've been looking into what postgres recommends as the fail-safe solution: ordering within aggregations.The SQL syntax for this is something like
array_agg(col_name ORDER BY other_col DESC, third_col ASC)
. I added this to thePrimQuery
andSql
types without issue. However, I run into problems near the surface syntax.Initially I considered adding something like
orderAggregate :: Order ? -> Aggregator (Column a) (Column b) -> Aggregator (Column a) (Column b)
. However, the type variable at the question mark is the problem. Just allowing it to beColumn a
obviously isn't enough, since you usually want to order by a different column.I've now got something with the signature
orderAggregate :: Order c -> Aggregator c d -> Aggregator c d
. This means that you can use all columns, however, the order then applies to all aggregation operations in theAggregator
. I've got this working, and indeed that's what it generates.Do you have any ideas how I could have this work per aggregation, but still have access to all the columns to use for ordering?