Closed krlmlr closed 2 months ago
This seems to be fine when I simulate a connection drop with duckdb. Kjell, are you seeing a different behavior when the connection to your real database drops?
drv <- duckdb::duckdb()
pool <- pool::dbPool(drv)
# All is fine
con <- pool::poolCheckout(pool)
con@conn_ref
#> <pointer: 0x600002eca180>
DBI::dbGetQuery(con, "SELECT 1")
#> 1
#> 1 1
# Connection drops
con <- unserialize(serialize(con, NULL))
con@conn_ref
#> <pointer: 0x0>
pool::poolReturn(con)
# Connection seems to be fine when checked out from pool again
con <- pool::poolCheckout(pool)
con@conn_ref
#> <pointer: 0x600002ec9fc0>
DBI::dbGetQuery(con, "SELECT 2")
#> 2
#> 1 2
pool::poolReturn(con)
Created on 2024-08-14 with reprex v2.1.0
In my experience it only happens when a server drops the connection. Here is a reprex:
pool <- pool::dbPool(drv = RMariaDB::MariaDB(),
host = "relational.fel.cvut.cz",
port = 3306,
user = "guest",
password = "ctu-relational")
con <- pool::poolCheckout(pool)
DBI::dbExecute(con, "SET SESSION wait_timeout=2;")
# [1] 0
pool::poolReturn(con)
Sys.sleep(3)
con <- pool::poolCheckout(pool)
DBI::dbIsValid(con)
# [1] TRUE
DBI::dbGetQuery(con, "SELECT 1")
# Error: Server has gone away [2006]
Thanks, confirmed.
What happens if you pass a validationInterval=0
argument to dbPool
? And if that works, validationInterval=1
?
This is a public-facing server, it's easy to try various options. That said, both validationInterval = 0
and = 1
seem to work.
I still see a race condition: checkout, connection drop, issuing query. It seems to be easy for the pool package to just retry in the case of query failures, perhaps optionally, and perhaps only if the error text (🙀🙀🙀) matches a pattern.
pool <- pool::dbPool(
drv = RMariaDB::MariaDB(),
validationInterval = 0,
host = "relational.fel.cvut.cz",
port = 3306,
user = "guest",
password = "ctu-relational"
)
con <- pool::poolCheckout(pool)
DBI::dbExecute(con, "SET SESSION wait_timeout=2;")
#> [1] 0
pool::poolReturn(con)
Sys.sleep(3)
con <- pool::poolCheckout(pool)
DBI::dbIsValid(con)
#> [1] TRUE
DBI::dbGetQuery(con, "SELECT 1")
#> 1
#> 1 1
Created on 2024-08-21 with reprex v2.1.0
pool <- pool::dbPool(
drv = RMariaDB::MariaDB(),
validationInterval = 1,
host = "relational.fel.cvut.cz",
port = 3306,
user = "guest",
password = "ctu-relational"
)
con <- pool::poolCheckout(pool)
DBI::dbExecute(con, "SET SESSION wait_timeout=2;")
#> [1] 0
pool::poolReturn(con)
Sys.sleep(3)
con <- pool::poolCheckout(pool)
DBI::dbIsValid(con)
#> [1] TRUE
DBI::dbGetQuery(con, "SELECT 1")
#> 1
#> 1 1
Created on 2024-08-21 with reprex v2.1.0
pool <- pool::dbPool(
drv = RMariaDB::MariaDB(),
validationInterval = 60,
host = "relational.fel.cvut.cz",
port = 3306,
user = "guest",
password = "ctu-relational"
)
con <- pool::poolCheckout(pool)
DBI::dbExecute(con, "SET SESSION wait_timeout=2;")
#> [1] 0
pool::poolReturn(con)
Sys.sleep(3)
con <- pool::poolCheckout(pool)
DBI::dbIsValid(con)
#> [1] TRUE
DBI::dbGetQuery(con, "SELECT 1")
#> Error: Server has gone away [2006]
Created on 2024-08-21 with reprex v2.1.0
Example for race condition with validationInterval = 1
:
pool <- pool::dbPool(
drv = RMariaDB::MariaDB(),
validationInterval = 1,
host = "relational.fel.cvut.cz",
port = 3306,
user = "guest",
password = "ctu-relational"
)
con <- pool::poolCheckout(pool)
DBI::dbExecute(con, "SET SESSION wait_timeout=2;")
#> [1] 0
pool::poolReturn(con)
Sys.sleep(1)
con <- pool::poolCheckout(pool)
Sys.sleep(2)
DBI::dbIsValid(con)
#> [1] TRUE
DBI::dbGetQuery(con, "SELECT 1")
#> Error: Server has gone away [2006]
Created on 2024-08-21 with reprex v2.1.0
My understanding of the way we intend people to use pool
with databases is to treat the pool object itself as the DBI connection for most cases, in which case the checkout and return are performed automatically just before and after the query. The only(?) reason to do an explicit poolCheckout is if you need transactions, in which case you absolutely do NOT want to respond to a closed connection by opening a new one and retrying.
(It has been quite a while since I’ve thought about this package so please excuse me if I’ve forgotten some important use case!)
A classic case of "you're holding it the wrong way", it seems? 🙃
Do you think it's worth stressing this point in ?poolCheckout
?
pool <- pool::dbPool(
drv = RMariaDB::MariaDB(),
validationInterval = 1,
host = "relational.fel.cvut.cz",
port = 3306,
user = "guest",
password = "ctu-relational"
)
DBI::dbExecute(pool, "SET SESSION wait_timeout=2;")
#> [1] 0
Sys.sleep(3)
DBI::dbIsValid(pool)
#> [1] TRUE
DBI::dbGetQuery(pool, "SELECT 1")
#> 1
#> 1 1
Created on 2024-08-21 with reprex v2.1.0
We totally could have a documentation problem. IIRC when Barbara and I originally wrote this package the intention was to have everyone do poolCheckout/poolReturn, it was Hadley’s idea to have the pool itself act like a connection. I don’t remember what the docs emphasized (and I’m about to board a plane so I won’t look right now).
Is this all happening because there’s a real database out there with a session timeout that’s shorter than 60 seconds?
No, the timeout could be arbitrarily long. I did assume that pool works this way, but never got the chance to try it in production. The first attempt wasn't successful, we picked shorter timeouts to be able to reprex.
Happy to help review a documentation PR.
After reviewing the docs, I think we might want to slightly reframe the use cases.
In Shiny (shiny?) apps with database access, querying the data from a database will block not only the UI of the app currently carrying out the database operation, but also the UIs of all other apps managed by the same process. (We learned that the hard way.) It's rare that we need to serve multiple connections (because all database accesses are short-lived anyway, with implicit checkout + read + return in close succession), but more important to keep the connections stable in the presence of server settings we can't control.
Async DB access is something that ADBC/adbi might be able to solve, but a few more things will need to fall into place before this happens. (Also, should it be async DB, or async DB + computation + ... in a separate R process handled by mirai?) Happy to discuss.
all other apps managed by the same process
I think you mean "session" instead of app? IIRC the relationship between R process and Shiny app is still only 1:1, i.e., you can't run more than one Shiny app simultaneously--but you can have more than one user simultaneously connected to a single app/process.
It's rare that we need to serve multiple connections (because all database accesses are short-lived anyway, with implicit checkout + read + return in close succession), but more important to keep the connections stable in the presence of server settings we can't control.
This is all true but I'm not sure what change in framing you're suggesting? If you use dbPool
the way we suggest in the docs (with the exception of not waving you away if you come upon ?poolCheckout
) then you get exactly the right behavior, right? With the exception of, I never expected any database server to have a connection idle timeout of less than a minute.
(Also, should it be async DB, or async DB + computation + ... in a separate R process handled by mirai?)
Unfortunately I think the answer is "it depends"... 😄
Added a note to ?poolCheckout
: https://github.com/rstudio/pool/pull/189
Thanks, the doc extension looks good. I have added my vision to https://github.com/rstudio/pool/pull/190 . These cover what I would add to make sure the reconnection part is covered too. Other than that, yes, the behavior is what I'd be looking for. Thank you for your support!
Yes, session => app 🤦
It depends, yes.
@krlmlr as we discussed last week, if the DB connection is known to block then a solution would be to use one dedicated mirai process to host the DBI / pool connection for the app (can even be ADBC) - something along the lines of daemons(1, dispatcher = FALSE, .compute = "special_db_uid")
. Then make all DBI queries via mirai
using the "special_db_uid" compute profile. This is cheap and should work well if you indeed rarely need to serve multiple connections at once.
This would operate independently of other async computations you might (or might not) want to run for the Shiny app using the default or another profile.
Just in relation to the case you mentioned, not talking about broader implications (yet 😄).
As Joe said, it depends.
If we move just the database into one process, but have other time-consuming computations and move them to another process, a lot of time will be spent on communication. That's my primary concern and the reason I'm not pushing too hard for "only" database async at this time. I'd like to understand this space better before moving.
If we move just the database into one process, but have other time-consuming computations and move them to another process, a lot of time will be spent on communication.
I think this also depends on what we're trying to solve. (i) for keeping the Shiny app responsive, both will be async and not block, and (ii) if we're not moving around huge amounts of data, the pure communications overhead is likely to be minimal (and 1/100 of what you might have in mind if your frame of reference is pre-mirai).
That's my primary concern and the reason I'm not pushing too hard for "only" database async at this time. I'd like to understand this space better before moving.
I see it as an opportunity to solve both cases rather than either/or, as I agree that it probably 'depends'!
If I follow my original idea, pool
may actually be the natural place for it to live as it's already managing a pool of connections. There could be a user option to have some of these in a background (or remote) process, and return 'mirai' from a pool connection if subsequent requests are wrapped with an async()
keyword (for example). Alternatively, we could have a special type of 'async pool' and only 'mirai' are returned from these.
Just floating some thoughts.
In some setups, the connection to the database server will be closed by the server or the client library, e.g., after a period of time. Such a connection still gives
TRUE
fordbIsValid()
, but running a query will fail.Can we extend
dbSendQuery()
anddbSendStatement()
to do a "smoke test" withSELECT 1
after failure, to see if the connection is still alive, and perhaps automatically attempt a reconnect?CC @kjellpk.