setting scale has some unanticipated side effects

adf-ncgr commented 1 year ago

@alancleary I mentioned in email that I was trying to scale up one of the instances, and I just noticed that the services on that one are now in a bad state. The relevant error appears to be: medicago-micro_synteny_search-1 | redis.exceptions.ConnectionError: max number of clients reached (note that this was not the service I scaled up, but I guess they're all using the same redis db, so a tragedy of the commons situation I suppose)

there may be more relevant info needed, but I'm submitting this before getting distracted

adf-ncgr commented 1 year ago

a bit more information which may just be a demonstration of my ignorance of redis (among other things). I looked a bit at the redis docs and noticed:

Maximum Concurrent Connected Clients In Redis 2.4 there was a hard-coded limit for the maximum number of clients that could be handled simultaneously. In Redis 2.6 and newer, this limit is dynamic: by default it is set to 10000 clients, unless otherwise stated by the maxclients directive in redis.conf.

It surprised me that we could have exceeded this seemingly generous number so I started monitoring the redis in the container as: sudo docker exec -ti medicago-redisearch-1 redis-cli client list | wc -l the count started at the already higher than I would have expected number of 1095 and has since risen to 1100. The increase seemed to correspond to some requests I made to the services using this redis instance, but that may have been a coincidence. It was up to 1101 at one point and then decreased by one, so I guess it's not a monotonic function. I am wondering if there may be some sort of client connection leak going on though. From elsewhere in the redis docs I gleaned that redis clients are by default not timed out which seems like probably a good thing overall but possibly could exacerbate leaks if they are present. I may be completely misconstruing the observed behaviors.

alancleary commented 1 year ago

This is interesting. I haven't encountered this before.

It certainly seems like there's some kind of connection leak. I'm not sure what the source would be, though. All of the microservices use a connection pool, which should manage the connections for us. Additionally, every microservice now has a shutdown function that implicitly closes the Redis connections when the service crashes or is stopped.

Unless this is a high priority I don't really have time to dig into it now. My one suggestion would be to update the microservices to use single-connection Redis clients instead of the default connection pool and then monitor the number of connections as you previously described. This should reveal if the problem is caused by the connect pool, or new connections are being unintentionally created by the microservices, or the connections are not closing when a service is shutdown.

adf-ncgr commented 1 year ago

not a high priority, I bumped the scaling down a bit on the medicago instance and it has seemed to be working better. Just wanted to note the issue while it was fresh in my mind.

legumeinfo / gcv-docker-compose

setting scale has some unanticipated side effects #14