Crashing query: Incorrectly capturing variables

tomjaguarpaw commented 9 years ago

SQL does not allow the right-hand clause of an INNER JOIN to reference columns defined in the left- hand clause. However, relational-record does not enforce this restriction in the type system. It is trivial to create a well-typed yet crashing query:

import Data.Int (Int32)
import Database.Relational.Query

zero :: Relation () (Int32)
zero = relation $ return (value 0)

captureVariable :: Projection Flat Int32 -> Relation () Int32
captureVariable i = relation $ do
  z <- query zero
  wheres (i .=. z)
  return z

foo :: Relation () (Int32, Int32)
foo = relation $ do
  i <- query zero
  j <- query (captureVariable i)

  return ((,) |$| i ! id'
              |*| j ! id')

main :: IO ()
main = putStrLn $ show foo ++ ";"

The generated SQL is

SELECT ALL T0.f0 AS f0, T2.f0 AS f1 FROM (SELECT ALL 0 AS f0) T0
INNER JOIN (SELECT ALL T1.f0 AS f0 FROM (SELECT ALL 0 AS f0) T1
WHERE (T0.f0 = T1.f0)) T2 ON (0=0);

and Postgres reports

ERROR:  invalid reference to FROM-clause entry for table "t0"
LINE 1: ...L T1.f0 AS f0 FROM (SELECT ALL 0 AS f0) T1 WHERE (T0.f0 = T1...
                                                             ^
HINT:  There is an entry for table "t0", but it cannot
be referenced from this part of the query.

NB this is why Opaleye uses a query arrow arrow instead of a query monad.

ocharles commented 9 years ago

I think a LATERAL join would fix this.

tomjaguarpaw commented 9 years ago

@ocharles I agree.

ocharles commented 9 years ago

I can confirm that this query works:

SELECT ALL T0.f0 AS f0, T2.f0 AS f1 FROM (SELECT ALL 0 AS f0) T0
INNER JOIN LATERAL (SELECT ALL T1.f0 AS f0 FROM (SELECT ALL 0 AS f0) T1
WHERE (T0.f0 = T1.f0)) T2 ON (0=0);

tomjaguarpaw commented 9 years ago

NB Not all flavours of SQL support something equivalent to LATERAL.

mboes commented 9 years ago

OK. Does this mean a monadic interface can be given sensible semantics for the SQL+LATERAL dialect? Or are there other issues (to be filed as separate tickets)?

tomjaguarpaw commented 9 years ago

As I understand it LATERAL allows a monadic interface with sensible semantics. Whether there are additional semantics bugs with relational-record I do not know. I doubt there would be additional crash bugs.

tomjaguarpaw commented 9 years ago

I just realised this has nothing to do with aggregation! I changed the title accordingly.

kazu-yamamoto commented 9 years ago

@tomjaguarpaw Would you tell me that how the arrow interface of Opaleye prevent this issue? So far, I understand that the arrow interface of Opaleye is equivalent to a state monad. I must misunderstand something.

tomjaguarpaw commented 9 years ago

The arrow interface of Opaleye is not equivalent to a state monad. The implementation of Opaleye's QueryArr is a "state arrow" (analogous to a state monad) but only a small subset of the state arrow's functionality is exposed. The interface to Opaleye is much weaker than a state arrow, let alone a state monad.

The problem with relational-record in this particular case seems to be that you can apply relation to a query block that captures columns from outside. Its type is

relation :: QuerySimple (Projection Flat r) -> Relation () r

in the code

captureVariable i = relation $ do
    ...

relation has no way to prevent the i argument coming from a previous clause. The alternative is to make QuerySimple an arrow, so you statically know it has no input. That means that the type of relation would become

relation :: QuerySimple () (Projection Flat r) -> Relation () r

In order to achieve the desired effect you would have to limit yourself to only arrow operations on QuerySimple and probably Relation too. If you're lucky and arrange things suitably relation might generalise to

relation :: QuerySimple a (Projection Flat r) -> Relation a r

In Opaleye I believe we have the invariant that a value q :: Column a -> QueryArr (Column b) (Column c) can never be combined into a query whereby the input Column a can depend on the result of any previous clause (essentially I think it means its value is fixed at compile time). On the other hand, Column b can depend on the result of previous clauses. For example Column a might be something like 1 + 2 * 3 whereas Column b might be table1.foo + table2.foo * 3. In this way we can ensure that we use references to previous columns correctly.

I'm afraid I can't say much more about relational-record because I don't really understand the setup of Relation, QuerySimple etc., but I hope that gives you a bit of a clue. Additionally, I have never found a convincing proof that arrows really are required but all my investigations so far convince me that they do address this issue of non-LATERAL joins very precisely.

ocharles commented 9 years ago

Not to derail the conversation too much but I still think HOAS or something can prevent that problem, but I'm not sure how it fits in with having a monad at the same time. On 17 May 2015 2:33 pm, "tomjaguarpaw" notifications@github.com wrote:

The arrow interface of Opaleye is not equivalent to a state monad. The implementation of Opaleye's QueryArr is a "state arrow" (analogous to a state monad) but only a small subset of the state arrow's functionality is exposed. The interface to Opaleye is much weaker than a state arrow, let alone a state monad.

The problem with relational-record in this particular case seems to be that you can apply relation to a query block that captures columns from outside. Its type is

relation :: QuerySimple (Projection Flat r) -> Relation () r

in the code

captureVariable i = relation $ do ...

relation has no way to prevent the i argument coming from a previous clause. The alternative is to make QuerySimple an arrow, so you statically know it has no input. That means that the type of relation would become

relation :: QuerySimple () (Projection Flat r) -> Relation () r

In order to achieve the desired effect you would have to limit yourself to only arrow operations on QuerySimple and probably Relation too. If you're lucky and arrange things suitably relation might generalise to

relation :: QuerySimple a (Projection Flat r) -> Relation a r

In Opaleye I believe we have the invariant that a value q :: Column a -> QueryArr (Column b) (Column c) can never be combined into a query whereby the input Column a can depend on the result of any previous clause (essentially I think it means its value is fixed at compile time). On the other hand, Column b can depend on the result of previous clauses. For example Column a might be something like 1 + 2 * 3 whereas Column b might be table1.foo + table2.foo * 3. In this way we can ensure that we use references to previous columns correctly.

I'm afraid I can't say much more about relational-record because I don't really understand the setup of Relation, QuerySimple etc., but I hope that gives you a bit of a clue. Additionally, I have never found a convincing proof that arrows really are required but all my investigations so far convince me that they do address this issue of non-LATERAL joins very precisely.

— Reply to this email directly or view it on GitHub https://github.com/khibino/haskell-relational-record/issues/19#issuecomment-102804583 .

tomjaguarpaw commented 9 years ago

I'm skeptical than HOAS is related, since this is not your common or garden lambda-calculus-style variable-capture problem, but if you can make anything of that I would be delighted!

mboes commented 9 years ago

HOAS isn't very good at limiting what part of the context can be captured by a closure, so I'm not sure how it would help here. In fact that's one of the reasons why adequacy does not hold for many HOAS encodings. This is in contrast with environment indexed de bruijn encodings, which are quite good at that (but a royal pain in other regards).

However, given the existence of LATERAL, and the assumption that the SQL query optimizer will do just as good a job at finding an efficient plan as it otherwise would without that keyword where it is not necessary, is making the expressive power of the interface weaker a problem worth solving?

tomjaguarpaw commented 9 years ago

I don't think it's an easy decision to make in the general case. LATERAL is not standard SQL. It is available in Postgres, but only since 9.4, i.e. in the last 6 months. Other SQL RDBMSes may have an equivalent, but I don't think all do.

I won't be adding support for LATERAL into Opaleye until 9.4 is old, because I know I have users who are not currently prepared to upgrade. relational-record may take a different point of view.

khibino commented 9 years ago

When passing projections into a query-building monad do statement from external scope, HRR generates a broken SELECT statement that references the left-side sub-query. We have known about this problem for two years. (https://github.com/khibino/haskell-relational-record/commit/2f606603455a1c26c0462e45934f29acd49ee6a2#diff-032f2e99a15be37134b7b04df8eb79e2R85)

This example suggests that this problem occurs not only when projections are passed as arguments, but also when nested do statements are used.

It is easy to implement an Arrow wrapper for HRR. (https://gist.github.com/khibino/57405584b168d98fd1e8)

When using this wrapper, the Arrow expression scope (between <- and -<) is separate, making previous arrow results out-of-scope. In this case, the issue is avoided, like Opaleye, but the issue still occurs when projections are passed to the domain-side of the Arrow.

In Opaleye, passing columns (which correspond to HRR projections) to the domain-side of the Arrow does not cause this problem. The reason, however, is not that Opaleye uses Arrows, but that the WHERE clauses of sub-queries in join-products are combined into an outer WHERE clause.

HRR can express each ON clause and UNION like(UNION, EXCEPT, INTERSECT) operations, so the WHERE clause of sub-queries are not combined automatically. (The purpose of this design is to make the resulting SQL statements easier to understand.)

I do not yet have a good idea for how to solve this issue.

kazu-yamamoto commented 9 years ago

To be fair, we would like to include the ON vs WHERE issue in our HRR paper. That is, SQL generated by HRR is simpler but its type-safety is weaker than that of Opaleye.

khibino commented 9 years ago

It may be possible to lift up and to combine WHERE clauses of sub-query join-products into an outer WHERE clause. I think that implementing query transformation with keeping its semantics is interesting in future works.

chrisdone commented 9 years ago

Having used HaskellDB for a couple years I can add my 2¢ that I don't care about looking at the generated SQL as long as it's correct, so prioritizing readability over correctness is not an aim I'd go for.

kazu-yamamoto commented 9 years ago

But what about performance? I heard that complex SQL statements generated by HaskellDB are slow with MySQL.

http://pseudofish.com/haskelldb-performance.html

chrisdone commented 9 years ago

With MySQL, yes. I admit I don't care at all about anything other than PostgreSQL.

kazu-yamamoto commented 9 years ago

:-)

khibino commented 9 years ago

@tomjaguarpaw Thanks for reporing this issue.

I added an arrow-combinator module 'Database.Relational.Query.Arrow' to import the idea of Opaleye!

Building queries using combinators imported from 'Database.Relational.Query.Arrow' instead of 'Database.Relational.Query' controls injection of previous local projections.

Combinator implementations are involved in https://github.com/khibino/haskell-relational-record/blob/72de038ef6c645feb92c9d889d7a6ece896a93d1/relational-query/src/Database/Relational/Query/Arrow.hs . Examples are involved in test code https://github.com/khibino/haskell-relational-record/blob/72de038ef6c645feb92c9d889d7a6ece896a93d1/relational-query/test/sqlsEqArrow.hs

I think using this new module makes relational-record safer on the same level with Opaleye.

khibino / haskell-relational-record

Crashing query: Incorrectly capturing variables #19