michaelhelmick / django-balancer

A set of tools for using Django's multi-db feature.
BSD 3-Clause "New" or "Revised" License
110 stars 5 forks source link

Querysets joins #8

Open sebzur opened 13 years ago

sebzur commented 13 years ago

I've got some problems with the balancer. In general, it is working as I need (I have site with +15 000 users, a few PostgreSQL replicated DBs, I prefer to use native Python solution like the balancer than e.g PGPool)

There is a problem with the balancer when the querysets results are joined, namely (simple yet self-describing example):

young_people = Person.objects.filter(age__lt=10)
old_people = Person.objects.filter(age__gt=30)

young_and_old = young_people | old_people

Yes, one could of course give (Q(agelt=10)|Q(agegt=30)) to filter but I wanted just to explicitly show what I mean: if You have multiple DB setup, one query (young_people) goes to one DB, then the router can switch to the other DB on next query (with old_people), and Django can not join the two, because they origin from the two different sources.

Any suggestions what to do (beside the trivial answer to try to rethink the task to get the one queryset from single DB, i.e. (Q(agelt=10)|Q(agegt=30)) in the example case)? Can one think of a router that would pin the dbs within a transaction (i.e. within a view)?

Thanks! Sebastian

bkonkle commented 13 years ago

This is definitely an interesting problem, and it's something I want to spend more time thinking about. It makes sense to me to simply pin all the queries from a single request to one database, but I'd like to test it out and see what kind of side effects this would introduce. In the meantime, you may need to adjust your code to work around this limitation.

sebzur commented 13 years ago

Thank You for this fast comment! Pinning all the queries from a single request to one DB I also find out (at least as a first thought) to be a reasonable solution.

I hope You will find some time to provide the additional router to handle the issue - I can't wait to test it!

bkonkle commented 12 years ago

I've finished a first pass at this feature, but I don't have an active project right now that I can do some real testing with. Can you test it out and let me know what you think?

Checkout the "feature/request-pinning" branch and set REQUEST_PINNING to True in your settings.py.

Thanks!

wolph commented 12 years ago

Since you usually want to spread the load as much as possible, I would vote against pinning within a transaction unless really needed (i.e. with writes and replication lag).

In this case I would go for a solution where you get a database from the router (i.e. db = db_for_read(Person)) and do a young_people, old_people = young_people.using(db), old_people.using(db) to use the same database for this specific case.

sebzur commented 12 years ago

Okay, that's good to see that. I'll try to switch from pgpool to the django-router and test this update. Thanks a lot for Your time!

Of course, specifying the common db with .using in queryset is a solution that one should use for the merges. However, in my rather big project I'll have to localize all occurrences of these and switch them.

The update that bkonkle have entered handles the cases before the switch.

I'll give You the info how it works soon.