passiomatic / coldsweat

Web RSS aggregator and reader compatible with the Fever API
MIT License
145 stars 21 forks source link

Close db connections after fetchers #94

Closed SSheldon closed 8 years ago

SSheldon commented 8 years ago

I was experimenting with the web app fetching feeds and hit the connection limit for my database. Tracked it down and realized that the feed workers in the pool were connecting to the database but not closing the connection when they were done, resulting in leaking connections.

I've tested this out with and without multiprocessing and all seems well. I'm a little surprised it works without multiprocessing, because it seems like a connection would already be open and this would close it?

passiomatic commented 8 years ago

I've tested this out with and without multiprocessing and all seems well. I'm a little surprised it works without multiprocessing, because it seems like a connection would already be open and this would close it?

Thank you for tracking this down. What database are you using? How many processes? I've never seen a connection limit reached error before (tested with sqlite and MySQL, other have tested Postgres). However, it makes a lot of sense closing connections while doing a fetch in multiprocessing mode.

Without multiprocessing now I realize that is less than ideal to keep opening and closing a connection (100 feeds -> 100 opening/closing connections). So I'm wondering if it makes sense to refactor that fetch_feeds to deal with this, without delegating to the feed_worker function.

SSheldon commented 8 years ago

I'm using Postgres, but I'm running on the free tier of Heroku which only allows 20 connections :) I was also starting the fetch from inside my web process, and since that's long running that might explain why I was able to observe the leaks accumulating.

Interestingly, I had multiprocessing set to 4 and I was only seeing 4 connections leaked per fetch. So despite feed_worker being called for each feed, I only lost 1 connection per process. Perhaps there's some sort of caching that peewee does for Postgres where it shares one connection per process no matter how many times you connect? Hmm but then I don't understand how closing it would work...

SSheldon commented 8 years ago

@passiomatic I've got a new approach here that opens and closes fewer connections :)

I found that connections will be closed when the worker process exits, so if we close the pool it won't leak connections. Presumably before I was seeing zombie processes live on and eat up my connection limit.

I've made connect the initializer for the multiprocessing pool; this means that each worker connects once when it is initialized rather than connecting before each feed.

So now fetching 100 feeds with 4 workers will only open 4 connections, and then they'll be closed when those worker processes exit!

I've tested this on postgres with 4 processes and no multiprocessing, as well as on SQLite with no multiprocessing. I can't use SQLite with multiprocessing because of the peewee.OperationalError: locking protocol issue.

Let me know what you think!

passiomatic commented 8 years ago

@SSheldon: I've merged the change into 0.9.6-wip branch and done several fetches. It seems to work very well. I've updated the code on my "production" server on Dreamhost which runs MySQL. Let's see how it performs in the wild.