Closed tomjaguarpaw closed 9 years ago
I think a LATERAL
join would fix this.
@ocharles I agree.
I can confirm that this query works:
SELECT ALL T0.f0 AS f0, T2.f0 AS f1 FROM (SELECT ALL 0 AS f0) T0
INNER JOIN LATERAL (SELECT ALL T1.f0 AS f0 FROM (SELECT ALL 0 AS f0) T1
WHERE (T0.f0 = T1.f0)) T2 ON (0=0);
NB Not all flavours of SQL support something equivalent to LATERAL
.
OK. Does this mean a monadic interface can be given sensible semantics for the SQL+LATERAL
dialect? Or are there other issues (to be filed as separate tickets)?
As I understand it LATERAL
allows a monadic interface with sensible semantics. Whether there are additional semantics bugs with relational-record I do not know. I doubt there would be additional crash bugs.
I just realised this has nothing to do with aggregation! I changed the title accordingly.
@tomjaguarpaw Would you tell me that how the arrow interface of Opaleye prevent this issue? So far, I understand that the arrow interface of Opaleye is equivalent to a state monad. I must misunderstand something.
The arrow interface of Opaleye is not equivalent to a state monad. The implementation of Opaleye's QueryArr
is a "state arrow" (analogous to a state monad) but only a small subset of the state arrow's functionality is exposed. The interface to Opaleye is much weaker than a state arrow, let alone a state monad.
The problem with relational-record in this particular case seems to be that you can apply relation
to a query block that captures columns from outside. Its type is
relation :: QuerySimple (Projection Flat r) -> Relation () r
in the code
captureVariable i = relation $ do
...
relation
has no way to prevent the i
argument coming from a previous clause. The alternative is to make QuerySimple
an arrow, so you statically know it has no input. That means that the type of relation
would become
relation :: QuerySimple () (Projection Flat r) -> Relation () r
In order to achieve the desired effect you would have to limit yourself to only arrow operations on QuerySimple
and probably Relation
too. If you're lucky and arrange things suitably relation
might generalise to
relation :: QuerySimple a (Projection Flat r) -> Relation a r
In Opaleye I believe we have the invariant that a value q :: Column a -> QueryArr (Column b) (Column c)
can never be combined into a query whereby the input Column a
can depend on the result of any previous clause (essentially I think it means its value is fixed at compile time). On the other hand, Column b
can depend on the result of previous clauses. For example Column a
might be something like 1 + 2 * 3
whereas Column b
might be table1.foo + table2.foo * 3
. In this way we can ensure that we use references to previous columns correctly.
I'm afraid I can't say much more about relational-record because I don't really understand the setup of Relation
, QuerySimple
etc., but I hope that gives you a bit of a clue. Additionally, I have never found a convincing proof that arrows really are required but all my investigations so far convince me that they do address this issue of non-LATERAL
joins very precisely.
Not to derail the conversation too much but I still think HOAS or something can prevent that problem, but I'm not sure how it fits in with having a monad at the same time. On 17 May 2015 2:33 pm, "tomjaguarpaw" notifications@github.com wrote:
The arrow interface of Opaleye is not equivalent to a state monad. The implementation of Opaleye's QueryArr is a "state arrow" (analogous to a state monad) but only a small subset of the state arrow's functionality is exposed. The interface to Opaleye is much weaker than a state arrow, let alone a state monad.
The problem with relational-record in this particular case seems to be that you can apply relation to a query block that captures columns from outside. Its type is
relation :: QuerySimple (Projection Flat r) -> Relation () r
in the code
captureVariable i = relation $ do ...
relation has no way to prevent the i argument coming from a previous clause. The alternative is to make QuerySimple an arrow, so you statically know it has no input. That means that the type of relation would become
relation :: QuerySimple () (Projection Flat r) -> Relation () r
In order to achieve the desired effect you would have to limit yourself to only arrow operations on QuerySimple and probably Relation too. If you're lucky and arrange things suitably relation might generalise to
relation :: QuerySimple a (Projection Flat r) -> Relation a r
In Opaleye I believe we have the invariant that a value q :: Column a -> QueryArr (Column b) (Column c) can never be combined into a query whereby the input Column a can depend on the result of any previous clause (essentially I think it means its value is fixed at compile time). On the other hand, Column b can depend on the result of previous clauses. For example Column a might be something like 1 + 2 * 3 whereas Column b might be table1.foo + table2.foo * 3. In this way we can ensure that we use references to previous columns correctly.
I'm afraid I can't say much more about relational-record because I don't really understand the setup of Relation, QuerySimple etc., but I hope that gives you a bit of a clue. Additionally, I have never found a convincing proof that arrows really are required but all my investigations so far convince me that they do address this issue of non-LATERAL joins very precisely.
— Reply to this email directly or view it on GitHub https://github.com/khibino/haskell-relational-record/issues/19#issuecomment-102804583 .
I'm skeptical than HOAS is related, since this is not your common or garden lambda-calculus-style variable-capture problem, but if you can make anything of that I would be delighted!
HOAS isn't very good at limiting what part of the context can be captured by a closure, so I'm not sure how it would help here. In fact that's one of the reasons why adequacy does not hold for many HOAS encodings. This is in contrast with environment indexed de bruijn encodings, which are quite good at that (but a royal pain in other regards).
However, given the existence of LATERAL
, and the assumption that the SQL
query optimizer will do just as good a job at finding an efficient plan as
it otherwise would without that keyword where it is not necessary, is
making the expressive power of the interface weaker a problem worth solving?
I don't think it's an easy decision to make in the general case. LATERAL is not standard SQL. It is available in Postgres, but only since 9.4, i.e. in the last 6 months. Other SQL RDBMSes may have an equivalent, but I don't think all do.
I won't be adding support for LATERAL into Opaleye until 9.4 is old, because I know I have users who are not currently prepared to upgrade. relational-record may take a different point of view.
When passing projections into a query-building monad do
statement from
external scope, HRR generates a broken SELECT statement that references
the left-side sub-query. We have known about this problem for two years.
(https://github.com/khibino/haskell-relational-record/commit/2f606603455a1c26c0462e45934f29acd49ee6a2#diff-032f2e99a15be37134b7b04df8eb79e2R85)
This example suggests that this problem occurs not only when projections
are passed as arguments, but also when nested do
statements are used.
It is easy to implement an Arrow wrapper for HRR. (https://gist.github.com/khibino/57405584b168d98fd1e8)
When using this wrapper, the Arrow expression scope (between <-
and -<
)
is separate, making previous arrow results out-of-scope. In this case,
the issue is avoided, like Opaleye, but the issue still occurs when
projections are passed to the domain-side of the Arrow.
In Opaleye, passing columns (which correspond to HRR projections) to the domain-side of the Arrow does not cause this problem. The reason, however, is not that Opaleye uses Arrows, but that the WHERE clauses of sub-queries in join-products are combined into an outer WHERE clause.
HRR can express each ON clause and UNION like(UNION, EXCEPT, INTERSECT) operations, so the WHERE clause of sub-queries are not combined automatically. (The purpose of this design is to make the resulting SQL statements easier to understand.)
I do not yet have a good idea for how to solve this issue.
To be fair, we would like to include the ON vs WHERE issue in our HRR paper. That is, SQL generated by HRR is simpler but its type-safety is weaker than that of Opaleye.
It may be possible to lift up and to combine WHERE
clauses of sub-query join-products into an outer WHERE
clause. I think that implementing query transformation with keeping its semantics is interesting in future works.
Having used HaskellDB for a couple years I can add my 2¢ that I don't care about looking at the generated SQL as long as it's correct, so prioritizing readability over correctness is not an aim I'd go for.
But what about performance? I heard that complex SQL statements generated by HaskellDB are slow with MySQL.
With MySQL, yes. I admit I don't care at all about anything other than PostgreSQL.
:-)
@tomjaguarpaw Thanks for reporing this issue.
I added an arrow-combinator module 'Database.Relational.Query.Arrow' to import the idea of Opaleye!
Building queries using combinators imported from 'Database.Relational.Query.Arrow' instead of 'Database.Relational.Query' controls injection of previous local projections.
Combinator implementations are involved in https://github.com/khibino/haskell-relational-record/blob/72de038ef6c645feb92c9d889d7a6ece896a93d1/relational-query/src/Database/Relational/Query/Arrow.hs . Examples are involved in test code https://github.com/khibino/haskell-relational-record/blob/72de038ef6c645feb92c9d889d7a6ece896a93d1/relational-query/test/sqlsEqArrow.hs
I think using this new module makes relational-record safer on the same level with Opaleye.
SQL does not allow the right-hand clause of an INNER JOIN to reference columns defined in the left- hand clause. However, relational-record does not enforce this restriction in the type system. It is trivial to create a well-typed yet crashing query:
The generated SQL is
and Postgres reports
NB this is why Opaleye uses a query arrow arrow instead of a query monad.