Open jeromegn opened 3 years ago
Hey, just checking if anyone has looked into this yet. Perhaps it was addressed in the 2.0 release?
@jsierles There were some fixes in the v2.0.0 release but I haven't been able to validate whether this is still happening or not, a test setup that reproduces the issue would help...
We'll give it a spin!
it seems like nats-pure.rb can sometimes stall the application. I've encountered this during an ActiveRecord transaction. I can't be sure what's happening, but none of the DB queries coming from a nats.subscribe block would get through.
Well, that's complex thing. ActiveRecord implicitly checkout a database connection from its connection pool for every thread that tries to execute a query (so every thread is given its own connection). This means that many subscriptions will consume many database connections: right now nats-pure creates 1 thread for every subscription (yes, subscription callback is executed in different thread from application one).
But in my understanding it shouldn't stall the app: there should be a lot of ActiveRecord::ConnectionTimeoutError
if connection pool capacity is significantly smaller than number of requesting threads (both puma/sidekiq and nats subscriptions ones). Performance will be degraded too.
Probably a database connection taken by a subscription callback isn't checked back into connection pool and thus it is leaked. So, maybe, we will need to automagically integrate with Rails Executor to return back all resources, that were implicitly taken during subscription callback execution.
But yeah, unrelated to connection issues
We use nats-pure.rb in a Rails environment, spawning multiple threads (because: puma, for example), and it seems to be wrecking havoc in certain scenarios.
(We've already contacted a maintainer of this project, this issue merely serves as a more public channel of communication in the spirit that others facing the same issues can find solutions)
Even with 0.7.2, we've been having messages going to the wrong callbacks. Essentially, connecting to a NATS cluster may trigger its callback with the wrong response (for example: expecting a PONG, but receiving a messages originally destined as a response to a request, sent from another thread). This causes errors like:
or
Our temporary solution is to use thread-local variables for memoizing NATS connections like:
Unrelated to connection issues, it seems like nats-pure.rb can sometimes stall the application. I've encountered this during an ActiveRecord transaction. I can't be sure what's happening, but none of the DB queries coming from a
nats.subscribe
block would get through. Listing the various current threads showed something like:I'm not savvy enough about nats-pure.rb to do much about this, unfortunately.