karamaan / karamaan-opaleye

Other
11 stars 7 forks source link

Implement orderBy. #60

Closed hesselink closed 10 years ago

hesselink commented 10 years ago

I've implemented orderBy, roughly according to TODO.md. I had to use an actual Scope, not an empty one, since the columns need to be looked up there. This also required a Default (PPOfContraVariant Unpackspec) instance. Would you also want the non-default versions?

I've also changed the API a little bit. The Query argument seemed redundant, it just required duplicating the Category composition. Also, makeOrderSpec seemed a bit verbose, I went with asc and desc instead. Feel free to suggest other names, or point out any changes you'd like regarding coding style and such, since I'm not sure exactly what you use.

I haven't thoroughly tested this, but examples like this work:

orderedPersons = orderBy [desc (2 * arr age), asc (arr address)] . personTable

This produces the following SQL:

SELECT name as name_1,
       age as age_1,
       address as address_1
FROM persons as T1
ORDER BY 2 * (age) DESC,
         address ASC
tomjaguarpaw commented 10 years ago

Excellent! Is there something wrong with the SQL example? It doesn't have any ORDER BY clause.

I suspect, unfortunately, that [OrderSpec a] -> QueryArr a a is not correct, and it does have to be [OrderSpec a] -> Query a -> Query a. QueryArr a b roughly corresponds to a -> [b] and the sorting operation is not a -> [a] but rather [a] -> [a], a.k.a. (() -> [a]) -> (() -> [a]). I think that using the type [OrderSpec a] -> QueryArr a a will lead to arrow law violations.

hesselink commented 10 years ago

Yeah, I think I made a mistake copy-pasting. I ran it again, and updated the description above.

Regarding the sorting, I initially thought the same. In fact, I implemented it like that. But then I figured that since you can write orderBy spec id . personTable, I could probably simplify it by removing the id. Do you have any idea in what direction I should search to try and find a problem with this?

hesselink commented 10 years ago

I'm actually pretty sure my current implementation is equivalent to what I had before. I now have:

orderBy oss = QueryArr $ \(x, q, t) -> (x, mkOrder q x, t)

where I previously had

orderBy oss (QueryArr qa) = QueryArr $ \i -> let (x, q, t) = qa i in (x, mkOrder q x, t)

Which seems to be equivalent to my current implementation plus the category instance. What am I missing?

tomjaguarpaw commented 10 years ago

The implementation will work for your more general type signature, but I don't think the arrow laws will hold for that type signature, or at least I don't know of a proof.

That is to say, I don't think [OrderSpec a] -> QueryArr a b -> QueryArr a b is OK either. (The a -> [b] intuition breaks down there, I guess).

hesselink commented 10 years ago

Well, if you're right about my code not being correct, then the QueryArr a b formulation is also incorrect, since they are equivalent. Given my previous implementation above, you can just use orderBy oss id to get something with the signature I have now, and (slightly sloppy) equational reasoning shows that it is the same as the current code as well:

oldOrderBy oss id
== -- Category instance
oldOrderBy oss (QueryArr id)
== -- Inline old definition, shuffle arguments
\oss -> QueryArr $ \(i -> let (x, q, t) = id i in (x, mkOrder q x, t))
== -- Apply id, remove redundant let
\oss -> QueryArr $ \(x, q, t) -> (x, mkOrder q x, t)
== -- New definition
orderBy oss

The code with () inputs is of course not equivalent: the result arrow might as well take an arbitrary input, but we cannot pass a QueryArr a b where a QueryArr () b is expected.

I'm still not clear on what is wrong with the current implementation, though. What arrow law is it violating? Do you have an example of something that goes wrong?

tomjaguarpaw commented 10 years ago

Right, I'm saying that both of these give rise to invalid arrows

and this one is correct

In theory it might be possible to write a valid [OrderSpec b] -> QueryArr a b -> QueryArr a b in a way different from how you have written it that coincides with your one at a = (). If the implementation is possible it will be "abusing" SQL to achieve this behaviour. After all, SQL itself only knows how to aggregate Querys, not QueryArrs. This theoretical implementation would not give rise to an equivalent [OrderSpec a] -> QueryArr a a, however.

The intuition I am using to make these claims is that the semantics of QueryArr a b should match the semantics of the arrow instance for Kleisli [] a b, that is a -> [b], under Kleisli composition. (Note that I am not saying that we get the ArrowApply instance, just the Arrow instance!)

Since sort is [a] -> [a] this means the Opaleye version is QueryArr () a -> QueryArr () b. We can also get (sort .) :: (a -> [b]) -> (a -> [b]) which makes me think that QueryArr a b -> QueryArr a b might be possible. However it may be hard to achieve with the primatives that SQL provides. Note that sort . return :: a -> [a] does not do anything useful, which is why I think that oldOrderBy oss id shouldn't do what you think it does.

QueryArr a a on the other hand means a -> [a] which certainly can never be a sort. That's why I'm confident in claiming that [OrderSpec a] -> QueryArr a a can never be right.

Regarding the laws, on reflection I'm not sure it is the arrow laws that are violated, but perhaps there is some other semantics that doesn't work. Anyway, I don't have anything more formal than my intuition based on a -> [b] as described above. However, I am very wary of putting anything into the library that I am not 100% sure is semantically correct. This is a mistake HaskellDB made and presumably one of the reasons that it never succeeded.

hesselink commented 10 years ago

I have some experience with list arrows (a -> [b]) and you are right, sorting is an operation that takes an arrow, it cannot be an arrow itself. However this is not because it would break some invariant, but just because it doesn't work! For example, say you have sort :: ListArrow a b -> ListArrow a b, and you try my trick: to sort an arrow a, instead of saying sort a, you say sort id . a. This doesn't work, because you're sorting a singleton list for each item produced by a, instead of sorting all the results of a together.

The QueryArr, however, doesn't seem to have a list in it at all. Instead, it just contains a single item. This item, as far as I understand it, is something containing Wires, and the fact that the Wire type is (mostly) opaque is what ensures correctness, because you cannot, say, pattern match on an Int, because you have a Wire Int.

I see your argument for not including things that are not correct, but I'm wary of not doing things because of a hunch. I'd like to have something in the form of tests, complicated cases, things HaskellDB allows that you want to avoid, invariants of the QueryArr internals, anything that can guide what is and isn't ok to write. Implementing code based on somebody else's intuiting is very tricky ;)

tomjaguarpaw commented 10 years ago

Here's the quickest way I can give you something more formal. By parametricity the functions a -> [a] biject with the natural numbers. That is, the only functions a -> [a] are those that replicate their argument n times. We want the same property to hold for QueryArr, that is the only values of type QueryArr a a should be those that replicate their input rows n times. Thus sorting cannot be a value of type QueryArr a a.

Now this certainly isn't a watertight argument because there's an Ord constraint and OrderSpec argument to take care of, but I hope it gives you some idea of what I'm going for here. Basically there's a certain underlying semantics that I want to encode which, amongst other benefits, will give us lots of nice equational reasoning properties.

I certainly understand your desire to see more concrete examples of why I am objecting so I will work on coming up with something more specific.

tomjaguarpaw commented 10 years ago

QueryArr is a bit like Higher Order Abstract Syntax in that it is a datatype encoding an abstract sytax tree where the datatype contains far more inhabitants than there are actually valid terms of the AST.

tomjaguarpaw commented 10 years ago

Here's an example of a property we want that is violated if sortBy :: ... -> QueryArr a a:

f :: QueryArr a b
g, h :: Query a
f <<< (g `union` h) = (f <<< g) `union` (f <<< h)

If f = sortBy ... then this can't hold.

hesselink commented 10 years ago

That is a good example, thanks! Unions are a tricky issue anyway, I now realize, since an order by statement can only appear after the final union. So this isn't valid:

select foo from tbl order by foo
union
select foo from tbl

However, this is easily created by something like (orderBy [asc (arr foo)] fooTable)unionfooTable. I'm not sure if we'd want to prevent this, and if so, how.

Also thanks for the attempts at explaining a bit more about your ideas of the semantics of QueryArr. It's still tricky for me, however. For example, given these semantics, it seems sortBy :: ... -> QueryArr a b -> QueryArr a b is a perfectly valid implementation. However, then I would expect personTable >>> sortBy oss id to produce unsorted output (essentially sorting each item separately, as with the list arrows above). However, as I showed this does work and is the same as using sortBy :: ... -> QueryArr a a. Is that a bug, or is there some part of the semantics still missing?

I'll change the implementation of orderBy to take a QueryArr () a, I think the union example is convincing. I'll also experiment some more with it, see what else I can break ;)

tomjaguarpaw commented 10 years ago

I'm not sure that you mean about the invalid select. This certainly works

(select * from table order by id)
union
(select * from table order by id)

As to your question of preventing malformed SQL, yes we always want to prevent it.

tomjaguarpaw commented 10 years ago

Regarding the semantics, yes sortBy :: ... -> QueryArr a b -> QueryArr a b is a perfectly decent thing to want to achieve and has the semantics of (sort .) :: (a -> [b]) -> (a -> [b]). However, your implementation of it is not correct. Your implementation only has correct semantics at a = (). Perhaps there is a correct implementation that agrees with yours at a = () but is also correct for all other a. However it will be somewhat fiddly to encode this is SQL.

See also DESIGN.md for an explanation of how it will be difficult to extend union :: Query a -> Query a -> Query a to QueryArr a b -> QueryArr a b -> QueryArr a b. It's the same issue.

tomjaguarpaw commented 10 years ago

Regarding merging the pull request: the first thing I would like to ask you to change is to write orderBy like Aggregate.aggregate, using runSimpleQueryArr and simpleQueryArr and pulling out orderBy' which acts directly on the type inside the QueryArr (analogous to aggregate').

Then it should be good to go, except I also want to understand whether the Unpackspec is really necessary. If so there need to be two versions of the combinator. One which accepts the dictionary and one with the Default typeclass constraint. Opaleye never forces users to have a typeclass. Everything should be possible through the dictionary.

hesselink commented 10 years ago

The invalid select problem happens only without the parentheses. I cannot find this documented in the Postgres documentation except, perhaps, here:

query1 and query2 are queries that can use any of the features discussed up to this point.

This is in section 7.4 on union, and ordering is only discussed in 7.5.

Stack overflow is full of examples, however, also for other DBMS'. I also uncovered another problem with unions and ordering. From the postgres docs:

ORDER BY can be applied to the result of a UNION, INTERSECT, or EXCEPT combination, but in this case it is only permitted to sort by output column names or numbers, not by expressions.

Again, wrapping it in a subselect fixes it, and it seems HaskellDB always does this, so I'm not sure if it's worth it to try and enforce this kind of thing. You say you want to prevent malformed SQL, which is of course a good starting point, but at some point the required complexity in Haskell is bigger than the amount of errors prevented.

hesselink commented 10 years ago

I'll look into the changes required. I did at some point need the Unpackspec, since otherwise it errored out with an unresolved column name, but perhaps that was because I was accepting input at that point?

tomjaguarpaw commented 10 years ago

In that case I suspect you're right and the Unpackspec is required. But let me look into that. I should be able to tell if I concentrate for a bit.

tomjaguarpaw commented 10 years ago

I'm not sure what you mean by "enforce" then. Do you mean "enforce that ordering a union can only use a column and not an expression"? There's no need to enforce that as it's a bizarre restriction that we can overcome Opaleye-side, as you point out.

hesselink commented 10 years ago

Yes, that was phrased a bit poorly, I rewrote it halfway through. What I meant was, that it probably isn't even a problem currently, and I don't think there's any benefit in trying to enforce it internally somehow. The same probably goes for not allowing ordering in unparenthesized parts of a union, since opaleye/haskelldb always seems to add parentheses, and that works around it.

tomjaguarpaw commented 10 years ago

Right, the idea is that Opaleye is freely composable and it just generates the correct SQL to achieve the semantics expressed. I have no aims to make the generated SQL "close" to the inputted Opaleye (although it will generally be close in practice).

tomjaguarpaw commented 10 years ago

By the way, unPPOfContravariant def is called cdef. It stands for "contravariant def".

Well done for working your way through this morass! The Default stuff is still not as clear as it should be.

hesselink commented 10 years ago

Thanks! I keep thinking that there must be a nicer way to solve the problem that Default solves, but since it's still not 100% clear to me, I haven't found it yet.

tomjaguarpaw commented 10 years ago

I've thought about it and it seems the Unpackspec is required. (Theoretically some design improvement to ExprArr could remove the need but that's not anywhere near a priority currently.) Could you make the following changes though:

This would then be in line with how the rest of the library deals with typeclasses. (The naming scheme is not yet standardised though.)

hesselink commented 10 years ago

OK, I think I've made all the changes you requested. There's now an exported orderByU which takes an explicit Unpackspec a. The implementation is refactored into three functions, orderBy which calls orderByU which calls orderByU'. That last one contains the actual implementation, which is now in terms of runSimpleQueryArr and simpleQueryArr. I've also exported the OrderSpec type (but not the constructor) so you can use it in type signatures.

A small style question: you seem to keep everything under 80 columns. Would you like me to do that as well? I usually have my editor set a warning line a 100 columns.

Also, I should probably add some haddocks for the exposed identifiers.

hesselink commented 10 years ago

OK, I've added the haddocks. That was the last thing on my list, so if you don't have any more remarks, feel free to merge.

bergmark commented 10 years ago

I've played a bit with this. First of all, it works! :-)

I'd like to be able to abtsract the ordering functions so I can use them in joins as well, the best I've come up with is something like this

orderByName
  :: Default (PPOfContravariant Unpackspec) a a
  => (a -> To Wire User) -> Query a -> Query a
orderByName f = orderBy [asc (arr (name . f))]

f can't be (To Wire x -> To Wire user) since To isn't injective, and I can't pull f out of the query since we have a Query a, any ideas to make this nicer?

tomjaguarpaw commented 10 years ago

Not exactly sure what you're looking for, but could you just carry around

orderSpec f = [asc (arr (name . f))]

instead, and apply orderBy to it at each usage site?

hesselink commented 10 years ago

Then we'd need something like Contravariant OrderSpec, but that seems implementable. @bergmark, do you think that would give us enough abstraction power?

What we were trying to define was something like on :: (b -> a) -> (a -> b -> b) -> (Query a -> Query a) -> Query b -> Query b. The first two arguments form a lens, the third would be some orderBy call. So given orderByName :: Query User -> Query User, you could call on fst (\u (_, p) -> (u, p)) orderByName to get something of type Query (User, p) -> Query (User, p). I couldn't do this, though, and I suspect it's not possible, even though semantically it seems it should be.

tomjaguarpaw commented 10 years ago

Contravariant OrderSpec is very sensible! Arguably the list should be inlined into the type too, and it should be made a Monoid.

tomjaguarpaw commented 10 years ago

What would the intended semantics be for on :: (b -> a) -> (a -> b -> b) -> ([a] -> [a]) -> [b] -> [b]?

tomjaguarpaw commented 10 years ago

By the way, I'll merge this as soon as I can give it a final thorough check.

hesselink commented 10 years ago

I like the Monoid idea, that would remove the square brackets in the (common) case where you order on a single expression.

The semantics for on would be that given that you can order a Query a, you can also order a Query b that contains all the information in a and more. For example, if you can sort a user table, you can also sort the user table joined with more information.

tomjaguarpaw commented 10 years ago

Sure, but can you replicate the desired semantics for on :: (b -> a) -> (a -> b -> b) -> ([a] -> [a]) -> [b] -> [b]? If not it seems implausible it would would for Query.

hesselink commented 10 years ago

Yes, something like this:

on get set f bs = zipWith set (f (get bs)) bs
hesselink commented 10 years ago

I've implemented Contravariant and Monoid instances for OrderSpec.

bergmark commented 10 years ago

Curious, why is Contravariant needed? What Tom suggested works without it.

tomjaguarpaw commented 10 years ago

Some sort of contramap is needed if you want to turn OrderSpec a into OrderSpec b using a b -> a (without accessing the internals of OrderSpec).

tomjaguarpaw commented 10 years ago

on get set f bs = zipWith set (f (get bs)) bs

This has type on :: ([b] -> t) -> (a -> b -> c) -> (t -> [a]) -> [b] -> [c], so I guess it's not what you meant. Did you mean

on :: (b -> b1) -> (a -> b -> c) -> ([b1] -> [a]) -> [b] -> [c]
on get set f bs = zipWith set (f (fmap get bs)) bs

I don't think this does what you want though.

hesselink commented 10 years ago

You're right, it doesn't, it sorts the as but doesn't keep them joined to the bs. I think your idea is better: to provide the OrderSpec, contramap over that, and use that to sort. Or just the original thing that @bergmark posted.

tomjaguarpaw commented 10 years ago

Looks great. Thanks for getting to grips with Opaleye and producing a substantial piece of functionality!

hesselink commented 10 years ago

My pleasure. Thanks for your patience and explanations!