Open AlexLaviolette opened 6 years ago
@AlexLaviolette can you share your slick configuration?
The number of threads used by Slick must be equal to the number of connections in the pool.
I was just using the default configuration for slick. But after reading through https://github.com/slick/slick/issues/1614 and some other discussions and trying several of the suggestions in there (including max_conns = num_threads) I am still running into this issue.
My configuration before was:
slick.dbs {
default {
profile="slick.jdbc.MySQLProfile$"
db {
driver="com.mysql.jdbc.Driver"
url=${PLAY_DB_URL}
user=${PLAY_DB_USER}
password=${PLAY_DB_PASSWORD}
leakDetectionThreshold=60000
}
}
}
I have since upgraded play-slick
from 3.0.1 -> 3.0.3
which is the latest version that I believe supports play 2.6.x (I am using 2.6.12)
I also upgraded slick
from the version bundled in play-slick
from 3.2.1 -> 3.2.3
. Unfortunately, it seems like the relevant bug fixes were present in 3.2.1
already, but I gave it a shot regardless. This did provide a nice warning message which basically echoes what is said in the discussion:
[warn] slick.util.AsyncExecutor - Having maxConnection > maxThreads can result in deadlocks if transactions or database locks are used.
I configured slick as follows which stifled the warning message
slick.dbs {
default {
profile="slick.jdbc.MySQLProfile$"
db {
driver="com.mysql.jdbc.Driver"
url=${PLAY_DB_URL}
user=${PLAY_DB_USER}
password=${PLAY_DB_PASSWORD}
leakDetectionThreshold=60000
numThreads=20
maxConnections=20
}
}
}
However I am still encountering the error. In the discussion it seemed that if this fix only applied to Sequence
and AndThen
while I was using flatMap
. I am unsure if the fix didn’t apply because it didn’t need fixing or it wasn’t fixed.
So as a test, I switched to using an AndThen
query - I still encountered the issue unfortunately. However, this did eventually show a new exception I had not seen yet, although I’m not sure how relevant it is:
Exception in thread “db-10” java.lang.IllegalArgumentException: requirement failed: count cannot be decreased
at scala.Predef$.require(Predef.scala:277)
at slick.util.ManagedArrayBlockingQueue.$anonfun$decreaseInUseCount$1(ManagedArrayBlockingQueue.scala:54)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
at slick.util.ManagedArrayBlockingQueue.locked(ManagedArrayBlockingQueue.scala:201)
at slick.util.ManagedArrayBlockingQueue.decreaseInUseCount(ManagedArrayBlockingQueue.scala:53)
at slick.util.AsyncExecutor$$anon$2$$anon$1.afterExecute(AsyncExecutor.scala:129)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1157)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Please let me know if anything sticks out to you, and thanks for responding!
@AlexLaviolette thanks for the super-thorough explanations and tests. If you could provide a reproducer it'd be a great help to narrow down the issue.
@AlexLaviolette thanks for the super-thorough explanations and tests. If you could provide a reproducer it'd be a great help to narrow down the issue.
I've put together a demo application here that replicates the issue https://github.com/AlexLaviolette/slick-demo. It's basically a stripped down version of what we are actually running that only includes an example of the problematic code.
I have that deployed to a single machine (2GB RAM VM), connected to a single mysql db (1GB RAM VM).
The script I run to reproduce it is in the repo under /test/api_load_test.py
. I am able to reproduce it running that from a single VPS (8GB RAM, 4 CPUs)
Hopefully this helps, let me know if you need anything else.
I forked this and added the change referenced here to version 3.0.3. Unfortunately, this did not seem to fix my issue.
We are experiencing this same issue. Dropping transactionally resolves the issue for us as well.
@AlexLaviolette did you find a resolution for this in the end?
For future searchers, we got to the bottom of this issue. Ours was actually caused by an error on our side.
We'd flatmapped DBIOs
and Futures
using DBIO.from
. The implementation of the method that was returning the Future
was changed to use the database, meaning that it had its own transaction.
So it looked something like:
val dbio = for {
a <- getAFromDatabase // a DBIO
b <- getBFromDatabase // a DBIO
c <- DBIO.from(calculateC) // A Future, but needing to run a DBIO to perform the calculation
d <- doIt(a, b, c) // a DBIO
} yield d
x.run(dbio.transactionally)
Because the outer transaction (where the DBIO
and Future
were combined) was still active while the Future
was running, in some cases the inner transaction run to produce the Future
result was not able to acquire a connection for that transaction.
Under load, this quickly lead to a deadlock - all connections were consumed by the outer transactions and none of the inner transactions (used by calculateC
) were able to acquire a connection from the pool.
This is unlikely to be related to the problem faced by @AlexLaviolette but did have the same symptoms.
Because the outer transaction (where the
DBIO
andFuture
were combined) was still active while theFuture
was running, in some cases the inner transaction run to produce theFuture
result was not able to acquire a connection for that transaction.Under load, this quickly lead to a deadlock - all connections were consumed by the outer transactions and none of the inner transactions (used by
calculateC
) were able to acquire a connection from the pool.
Thanks a lot! Was looking for a hangup for 8 hours or so and slowly started to understand that it's related to nested transactions. What you describe is exactly what I have and after your description, working around it was trivial. Thanks mate! Peter.
In my case this was caused by NewRelic javaagent (using play 2.8.7 slick 3.3.3 (numThreads = 5) hikari 4.0.3 mysql connector 8.0.22) with newrelic: without newrelic:
Currently on play 2.6 with play-slick 3.0.1, I’ve been running into an issue during load testing where the app is consuming tons of db connections from the pool and not returning them. I have narrowed it down to the use of
transactionally
, but have still not been able to pin point the exact cause.As proof of this, when I increase the db connection pool to 340 (our max_connections is 350), I can hit up to 28k rpm on one server. But with a pool of only 100, it almost instantly crashes under the same load with the connections timing out waiting for a connection from the pool. I can confirm that the database transactions are happening rapidly. When I enable hikari connection leak detection, I can see that some connections are taking > 60s to be returned, yet the database is showing nothing over 5 seconds. From everything I’ve read, we should be able to configure this to run with a much smaller pool. When I remove the use of
transactionally
, the behaviour of the pool returns to normal.Here are some logs: with pool of 100 and transactionally enabled
with pool of 340 and transactionally enabled
Some errors with pool of 100 and transactionally enabled:
NewRelic is also showing these requests spending the majority of their time waiting on
com.zaxxer.hikari.HikariDataSource.getConnection()
Here is an example of the code that is causing the issue:
I am unable to reproduce this at low load or locally by manually causing the queries to fail. But something about using
transactionally
here is causing connections to not be returned to the pool for a very long time. There are reports thatPreviously reported leaked connection com.mysql.jdbc.JDBC4Connection@795acb1e was returned to the pool (unleaked)
but often after several minutes.