mperham / connection_pool

Generic connection pooling for Ruby
MIT License
1.63k stars 143 forks source link

Connection Pool leaking connection object. #137

Closed gr790 closed 3 years ago

gr790 commented 3 years ago

Hello,

It might be duplicate to https://github.com/mperham/connection_pool/issues/136 or new.

I have similar kind of setup, and sometimes/very rare I see this exception in rails production.log, as it is production setup and I don't have enough details what was going in that node that time. As I am aware, it was not handling more than 20 requests per seconds and it happened after 10/15 minutes as some other process sent USR1 signal to puma master process.

I am forking puma worker in preload_app! mode with no additional changes in any fork method in puma.conf and in on_worker_boot method(default come with puma gem). In puma.conf, I have 4 processes, with each holding 40 threads. I couldn't relate how USR1 and preload_app! will behave, as both are incompatible and doesn't know the exact outcome.

To prove this exception exist, I had to add additional code in rails/config/initializers/fake_leak.rb with additional threads.

40.times do
  Thread.new do
    Rails.cache.redis.checkout()
    puts "checkout connection...#{Rails.cache.redis.inspect}"
    puts "-------------------------"
    #Rails.cache.redis.checkin()
    #puts "checking connection...#{Rails.cache.redis.inspect}"
  end
  puts "All the connections leaked !!!!!!!!!!!!!!!!!!!"
end

In above code I am leaking 40 connection in master, and when parent fork the child process on receiving USR1, child process doesn't get any connection and it starts throwing exception...you can easily verify connection obj #{Rails.cache.redis.inspect} and "netstat -antp| grep 6379", physical connections are much less than actual connection objects available to processes.

Without doing above changes, I am not getting any exception in production look like lab setup, but same exception happened in production without above changes, it's making me crazy.

I have investigated more in connection_pool/timed_stack.rb class and put some logs, but I am not getting any clue where is the Issue ? rbtrace is not helping either and it introduced lots of changes in os packages.

After spending 2 weeks on this, I have migrated redis_cache_store to memory_store. I hope it will work fine, as it doesn't require any connection to fetch the entry from memory and connection_pool will not be used, but still in progress.

Thanks, Rahul.

mperham commented 3 years ago

So to summarize if I understand: you have a connection pool in the parent process, you are checking out one or more connections and then you are forking and finding that the connections are not checked back in by the child process? This scenario makes sense to me but I don’t see how connection pool could fix it. This is more of a high-level software design issue where you would need to fix the code to reclaim any connections before forking.

gr790 commented 3 years ago

Hello,

Thanks for quick reply. In config/environments/production.rb, I have setup config.cache_store as similar as mentioned in https://github.com/mperham/connection_pool/issues/136. In config/initializers/init.rb, I am creating a thread(Thread.new) which repeatedly access the redis cache and write some information(every 1 minute). It is done in master process. Once it is done, PUMA fork child processes and start processing actual http traffic. During request processing by child processes, child processes talks to redis cache and fetch the information written by master process and do the actual job. During 1 of request processing, I get exception as similar to what is mentioned in https://github.com/mperham/connection_pool/issues/136.

So to summarize if I understand: you have a connection pool in the parent process, you are checking out one or more connections and then you are forking and finding that the connections are not checked back in by the child process? This scenario makes sense to me but I don’t see how connection pool could fix it.

My dilemma is, I can't prove it with my existing source code and other thing is I understand connection_pool is per process as claimed(4 child processes should have 40 connection in each after fork), then why child processes should worry about connection acquired in master process.

This is more of a high-level software design issue where you would need to fix the code to reclaim any connections before forking.

I am not using Active record, as it has some connect/disconnect methods, but I don't see any method available in redis_cache_store and connection_pool to claim any connections. If you notice, I have set reconnect_attempts to 1, even if child processes failed to get connection in first attempt, it will be retried, right ?

Thanks, Rahul.

mperham commented 3 years ago

connection_pool master now has a reload method for after_fork callbacks that will recycle any connections.