Closed gfredericks closed 9 years ago
Hi Gary,
What makes you think that atomic
's using nested wcar
calls? I'm looking at the macro now (pre-coffee, granted) - but not seeing any nested connections?
May help to see what change you've made.
Also, have you seen issue #127? Hoping to cut a release for that later today. Seems like the problem you're seeing may be related?
Atomic itself does not used nested wcar
calls; when I said that I was assuming there was no reason to use atomic
unless you intended to call wcar
in the body, which would therefore be nested (inside the wcar
that atomic
itself sets up).
I saw #127 and therefore did a few tests with the 2.4
release of the commons lib, and saw the same behavior.
atomic
takes a body which is executed using the top-level connection. Have you seen the example in the atomic
docstring? Maybe it'd help if you could show me some code so I can see what you're trying to do?
The body supplied to atomic
can (and does, in the case of handle1
) have calls to wcar
, which open a new connection.
To demonstrate the problem I created this commit which logs connection pool activity, and then added a scratch.clj
on this branch which tries to send a single message over a queue with a specified connection pool size.
In the results below, you can see two connections being acquired in the case where the pool is large enough (2), and the execution hangs trying to acquire the second connection in the case where the pool is not large enough (1).
$ CARMINE_POOL_SIZE=2 lein do clean, run -m clojure.main scratch.clj 2>/dev/null
BORROW :scratch
RETURN :scratch
2015-Jun-04 12:19:44 +0000 gary1 INFO [taoensso.carmine.message-queue] - Message queue worker starting: :my-queue
BORROW :start-polling-loop-1
RETURN :start-polling-loop-1
BORROW :start-polling-loop-1
RETURN :start-polling-loop-1
BORROW :start-polling-loop-1
RETURN :start-polling-loop-1
BORROW :start-polling-loop-2
RETURN :start-polling-loop-2
BORROW :start-polling-loop-1
RETURN :start-polling-loop-1
BORROW :start-polling-loop-1
RETURN :start-polling-loop-1
Received :my-msg
2015-Jun-04 12:19:45 +0000 gary1 INFO [taoensso.carmine.message-queue] - Message queue worker stopped: :my-queue
BORROW :atomic-outer
BORROW :handle1
RETURN :handle1
RETURN :atomic-outer
$ CARMINE_POOL_SIZE=1 lein do clean, run -m clojure.main scratch.clj 2>/dev/null
BORROW :scratch
RETURN :scratch
2015-Jun-04 12:18:52 +0000 gary1 INFO [taoensso.carmine.message-queue] - Message queue worker starting: :my-queue
BORROW :start-polling-loop-1
RETURN :start-polling-loop-1
BORROW :start-polling-loop-1
RETURN :start-polling-loop-1
Received :my-msg
BORROW :atomic-outer
2015-Jun-04 12:18:53 +0000 gary1 INFO [taoensso.carmine.message-queue] - Message queue worker stopped: :my-queue
BORROW :handle1
The potential fix I mentioned uses a dynamic var to keep track of the connection for the scope of the wcar
call, so that nested calls can reuse it. This commit is my proof-of-concept for that idea.
My issue-135-patched contains that fixed merged into the issue-135
branch from my previous comment. Here is the new output for pool size 1 (which is the same as the output for pool size 2):
$ CARMINE_POOL_SIZE=1 lein do clean, run -m clojure.main scratch.clj 2>/dev/null
BORROW :scratch
RETURN :scratch
2015-Jun-04 15:22:11 +0000 gary1 INFO [taoensso.carmine.message-queue] - Message queue worker starting: :my-queue
BORROW :start-polling-loop-1
RETURN :start-polling-loop-1
Received :my-msg
2015-Jun-04 15:22:11 +0000 gary1 INFO [taoensso.carmine.message-queue] - Message queue worker stopped: :my-queue
BORROW :atomic-outer
RETURN :atomic-outer
Hey Gary, thanks a lot for the additional info - that helps!
Will try clarify a few things:
Intentionally or unintentionally (I can't tell) the way that the atomic macro ends up expanding to nested wcar calls means that any use of atomic will require at least two connections from the pool.
The atomic
macro doesn't expand to nested wcar
calls unless you specifically ask it to (i.e. unless you include wcar
calls in the atomic
macro's body.
Atomic itself does not used nested wcar calls; when I said that I was assuming there was no reason to use atomic unless you intended to call wcar in the body, which would therefore be nested (inside the wcar that atomic itself sets up).
There are indeed reasons to use atomic
without wcar
calls in the body; that's actually the commonest use case. The body is executed within the context of the atomic
macro's (implicit) connection. The atomic
docstring illustrates this, but might not be clear enough? I'll try beef up the explanation.
The body supplied to atomic can (and does, in the case of handle1) have calls to wcar, which open a new connection.
So this is actually the fault of handle1
, not atomic
. It is indeed a fault btw; you'll sometimes want a nested wcar
call - but you don't need one here (in handle1
). The correct solution is to switch handle1
's nested wcar
to a with-replies
call - I'll do that momentarily.
as many workers as possible acquire their first connection until the pool is exhausted, and then they all block trying to acquire their second connection
This is something I hadn't considered, thanks for bringing it to my attention. It's definitely a bug.
I might also suggest bumping your pool size btw. Even without the 2 conns/thread, worker threads poll (block) by design - so you'll too easily end up connection starved unless your pool limit is >> your worker thread count.
Thanks Peter! I think all of this makes sense. I hadn't studied atomic
too closely since I wasn't myself writing code that used it, I just took for granted that handle1
was using it correctly. So the documentation may well be sufficient for someone looking to use it directly.
It's definitely also helpful to know that the pool size should be greater than the thread count. I had always figured that was probably a good idea, but not strictly necessary.
Thanks again.
I just took for granted that handle1 was using it correctly.
That should have been a safe assumption ;-)
Have cut a v2.11.1 release: https://github.com/ptaoussanis/carmine/releases - would you mind confirming that it solves your issue?
Thanks again for the report, cheers! :-)
2.11.1 looks to have fixed the problem, thanks!
Great, thanks for the confirmation.
Intentionally or unintentionally (I can't tell) the way that the
atomic
macro ends up expanding to nestedwcar
calls means that any use ofatomic
will require at least two connections from the pool.This doesn't sound terrible, but it can have racey effects when using message queue workers whenever there are more workers than the connections in the pool: as many workers as possible acquire their first connection until the pool is exhausted, and then they all block trying to acquire their second connection.
I tweaked the
wcar
macro to reuse its connection for nested calls, and this caused my tests to pass. I can't seem to successfully run the carmine test suite so I can't verify that it passes as well.I don't understand the carmine codebase well enough to guess whether this is a satisfactory solution (e.g., is it important for some reason that there be a separate connection used for atomic things?). If it is, I can prepare a PR.
Otherwise, should we add a warning to the message queue docs about the pool size? Or is there another approach to solving the problem?