zio / zio-quill

Compile-time Language Integrated Queries for Scala
https://zio.dev/zio-quill
Apache License 2.0
2.15k stars 348 forks source link

Support for basic joins when doing a batch update #1458

Open mdedetrich opened 5 years ago

mdedetrich commented 5 years ago

Expected behavior

I would expect to somehow support doing a join at the same time of doing a batch update, i.e. something equivalent to

UPDATE vehicles AS v 
SET price = s.price_per_vehicle
FROM shipments AS s
WHERE v.shipment_id = s.id

This would be the equivalent of something like this in quill

quote {
  liftQuery(items).foreach {item =>
    query[Shipments].flatMap {s =>
      query[Vehicles].filter(v => v.shipment_id == s.id)
    }.update(_.price -> item.price)
  }
}

Unfortunately such code isn't really possible because of how quill is structured right now. Right now the .update method only works on the EntityQuery and EntityQuery only works for basic filters on a simple table

Actual behavior

I would expect the above behavior to actually work, i.e. an ability to do basic joins when doing a batch update

@getquill/maintainers

deusaquilus commented 5 years ago

This would be awesome! It would complete the circle in terms of Quill being a end-to-end ETL solution. The challenge is that insert/update/delete + join is done differently in every database vendor: https://stackoverflow.com/questions/1293330/how-can-i-do-an-update-statement-with-join-in-sql

There is an ansi-standard where exists and Quill sort of supports it:

translate {
  liftQuery(List[Address](a)).foreach { newAddress =>
    (query[Address].filter(a =>
      a.street == newAddress.street &&
        query[Person].filter(p => p.id == a.ownerFk).nonEmpty
    )).update(_.street -> "somethingelse")
  }
}

... and I think even join works inside:

translate {
  liftQuery(List[Address](a)).foreach { newAddress =>
    (query[Address].filter(a =>
      a.street == newAddress.street &&
        query[Person].filter(p => p.id == a.ownerFk)
          .join(query[Organization]).on((p, o) => p.orgFk == o.id)
          .nonEmpty
    )).update(_.street -> "somethingelse")
  }
}
// UPDATE address SET street = 'somethingelse' 
// WHERE street = ? AND EXISTS ( 
//   SELECT p.*, o.* FROM (
//     SELECT p.* FROM person p WHERE p.id = a.owner_fk) AS p 
//     INNER JOIN organization o ON p.org_fk = o.id)

NOTE: This should probably be documented!

This approach is similar to to the insert + join one and could probably emulate it but, being able to use the join directly inside the insert would take us to a whole new level!

deusaquilus commented 5 years ago

@mdedetrich I think I want the syntax to be something like this and have the update on the outside of the foreach rather then the inside.

liftQuery(people)
  .join(query[Address]).on(_.id == _.ownerFk)
  .foreach { case (p, a) => query[Person].update(p.name, p.age, a.somethingElse) }

Or monadically like this:

{
  for {
    p <- liftQuery(people)
    a <- query[Address].join(_.ownerFk == p.id)
  } yield (a, b)
}.foreach { case (p, a) => query[Person].update(p.name, p.age, a.somethingElse) }

It's a bit difficult to find a very simple syntax for joins-with-actions that captures all the power we need to have. For instance, tables that are inside the join should be useable inside the update clause. That makes doing things like mapping from the original table and back (after a join) untenable.

query[Person]
  .join(query[Address]).on(_.id == _.ownerFk)
  .map(_._1).update(/* We want to use Address columns here but can't! */)

Also, if you think about it, something like this might be intuitive:

liftQuery[Person].map(p => (p.name, p.age)).update(("Joe", 123))

... however it can get non-sensical when you do something like this:

liftQuery[Person].map(p => (p.name, p.age + 456)).update(("Joe", 123))
// 'age' has an operation, how do we map it back to a table???

Anyway, when you have a set of join operations whose signature looks like this:

Query[A] => Query[(A, B)] => Query[(A, B, C)]...

You always have to map it back to the updating table Query[UT] so that you can do an update query:

Query[A] => Query[(A, B)] => Query[(A, B, C)]... => 
  /* Need to map this back! */ Query[UT].update(....)

The problem here is that effectively, the Last table is now the first one in the actual update query.

UPDATE ut SET (...) VALUES (...)
  from a
  join b
  join c
  join d
  ...

I think this inversion is a bit jarring and the Scala doesn't really convey the right intent.

I think doing something like this:

(Query[A] => Query[(A, B)] => Query[(A, B, C)]... => ...)
   .foreach(r => query[A].update(...))

... conveys the intent a bit better.

... and then gives you the power to do this:

(Query[A] => Query[(A, B)] => Query[(A, B, C)]... => ...)
   .foreach { case(a, b, c, ...) => query[A].update(...) }

Of course, we still have to verify that the the first table of the outside sequence has the same record ad the update query inside the foreach i.e. A.

We should probably also get @fwbrasil's opinion on this.

deusaquilus commented 5 years ago

It's interesting to note that the equivalent of this kind of update query is INSERT INTO. For instance:

liftQuery(people)
  .join(query[Address]).on(_.id == _.ownerFk).filter { case (p, a) => a.state == "CA" }
  .foreach { case (p, a) => query[Person].update(_.yearsResidency = a.years) }

Could become:

UPDATE People AS p SET (status = a.yearsResidence)
FROM Address a
WHERE a.id == p.ownerFk

Which is really an optimization of:

UPDATE People SET (status = sub.yearsResidence)      --TODO Need to add an alias if we add RETURNING support
FROM (
  SELECT p.name, p.age, a. ownerFk, a.street, a.yearsResidence
  FROM Person p
  JOIN Address a on p.id = a.ownerFk
)

If the case class representing this was:

case class SqlJoinedAction(assignments: List[Assignment], from: FromContext)
// or maybe from: List[FromContext] // I'm not sure yet

Then we can have a UnfurlJoinedUpdate which would go from the latter back to the former (similar to how ApplyMap works which I think should actually be called UnfurlMap or InvertMap).

Now say that you do insert in the above query as opposed to update:

liftQuery(people)
  .join(query[Address]).on(_.id == _.ownerFk).filter { case (p, a) => a.state == "CA" }
  .foreach { case (p, a) => query[Person].insert(_.name = "Anonymous", _.yearsResidency = a.years) }

This should be INSERT INTO:

INSERT INTO People /*AS pi */ VALUES (name, status)      -- TODO Add synonym here if it's a RETURNING case
SELECT ('Anonymous', a.yearsResidence)
FROM People p
JOIN Address a ON a.id == p.ownerFk

Unfortunately this latter case is impossible to simplify since even if we had the alias pi, we still wound't be able to constrain address records on it in the WHERE/JOIN clause above.