vlucas / Spot

[DEPRECATED - use v2] Simple DataMapper ORM for PHP 5.3+
75 stars 14 forks source link

Solve N+1 Query Problem #23

Open vlucas opened 12 years ago

vlucas commented 12 years ago

The classic N+1 query problem should be solved - specifically, for relations.

// Loop and print posts and comments
foreach($posts as $post) {
    echo "<h2>" . $post->title . "</h2>";
    echo "<p>" . $post->body . "</p>";
    echo "<hr />";
    // Comments
    foreach($post->comments as $comment) {
        echo "<p>" . $comment->body . "</p>";
    }
}

Currently, the above ‘HasMany’ example will incur N+1 queries. That is, one query will be executed for each set of comments requested for each post. So if 20 posts are returned and you print the comments for all posts, 20+1 queries will be executed.

This problem should be solved so that in the above situation, the maximum number of queries executed will be 2. One for all the posts, and one for all the comments related to all the posts in the result set. This would require some sort of intelligent cache of all the query conditions of the comments and all the primary keys of a collection once iteration begins.

marvelade commented 12 years ago

You could even do that in one query, no?

$sql = "SELECT post_id,body FROM comments AS c JOIN posts AS p ON p.id=c.post_id WHERE p.id IN (" . implode(',', $array_with_post_ids_from_search_result) . ")";

On Sat, Nov 17, 2012 at 11:16 PM, Vance Lucas notifications@github.comwrote:

The classic N+1 query problem should be solved - specifically, for relations.

// Loop and print posts and comments foreach($posts as $post) { echo "

" . $post->title . "

"; echo "

" . $post->body . "

"; echo "
"; // Comments foreach($post->comments as $comment) { echo "

" . $comment->body . "

"; } }

Currently, the above ‘HasMany’ example will incur N+1 queries. That is, one query will be executed for each set of comments requested for each post. So if 20 posts are returned and you print the comments for all posts, 20+1 queries will be executed.

This problem should be solved so that in the above situation, the maximum number of queries executed will be 2. One for all the posts, and one for all the comments related to all the posts in the result set. This would require some sort of intelligent cache of all the query conditions of the comments and all the primary keys of a collection once iteration begins.

— Reply to this email directly or view it on GitHubhttps://github.com/vlucas/Spot/issues/23.

vlucas commented 12 years ago

Yes, if you eager-load them. The idea though is to lazy-load them on-demand, which does require 2 queries.

marvelade commented 12 years ago

Isn't lazy loading why you'd put the query inside a (fictional, of course) $post -> getComments() method instead of calling $post -> comments? For them to be inside a property you'd have to perform the query in the constructor, regardless of the fact if you call the property or not.

IMHO both ways have their advantages and disadvantages. If you need the comments several times, querying once in the constructor is more processor economic than calling the getter several times. If not sure you need them, but if you do, you only need them once or twice, a getter method would be better.

Can you predict which approach is best if you don't know the application it'll be used in (which is the case for a general ORM) ?

On Sun, Nov 18, 2012 at 4:08 PM, Vance Lucas notifications@github.comwrote:

Yes, if you eager-load them. The idea though is to lazy-load them on-demand, which does require 2 queries.

— Reply to this email directly or view it on GitHubhttps://github.com/vlucas/Spot/issues/23#issuecomment-10486998.

pnomolos commented 12 years ago

@vlucas I'm not sure what you were thinking, but have a pass through getIterator on the relationship object that checks to see if it's already inside a getIterator loop, and pull the ids from that? Might require each getIterator query to also do a "get primary key in conditions" before it returns an iterator, though. I'm just coming up with stuff as I'm on my phone, though. I'm sure you've already thought of "look at how the ruby dm project" does it :grin:

vlucas commented 12 years ago

@marvelade These are not the only two options :). Spot solves this problem quite elegantly by pre-loading a Spot\Relation object on the property. When you access the property, you are actually accessing this proxy object. So then when you iterate over the object or perform a count, The relation query is executed and the collection is loaded (via SPL interfaces that provide getIterator and count methods). If you iterate over the same property multiple times, it just uses the same already-loaded collection, and doesn't execute the query again. So right now, it's all setup to be lazy-loaded on demand with fully customizable relation queries that you can even modify before execution (calling methods returns a Spot\Query object so you can refine the relation query further, also on the fly). It's a pretty nice system once you realize what it's doing and how much power it gives you.

As for the primary keys - I am not quire sure yet how to achieve this, but it may very well involve passing them down to the relation objects to handle. Loading all the results back on the original collection of Entities may be a challenge as well.

marvelade commented 11 years ago

Never really thought of it that way but that sounds like a solid plan. (also, I never really got the use in having an Iterator SPL class until now :) )

On Mon, Nov 19, 2012 at 4:46 PM, Vance Lucas notifications@github.comwrote:

@marvelade https://github.com/marvelade These are not the only two options :). Spot solves this problem quite elegantly by pre-loading a Spot\Relation object on the property. When you access the property, you are actually accessing this proxy object. So then when you iterate over the object or perform a count, The relation query is executed and the collection is loaded (via SPL interfaces that provide getIterator and count methods). If you iterate over the same property multiple times, it just uses the same already-loaded collection, and doesn't execute the query again. So right now, it's all setup to be lazy-loaded on demand with fully customizable relation queries that you can even modify before execution (calling methods returns a Spot\Query object so you can refine the relation query further, also on the fly). It's a pretty nice system once you realize what it's doing an d how much power it gives you.

As for the primary keys - I am not quire sure yet how to achieve this, but it may very well involve passing them down to the relation objects to handle. Loading all the results back on the original collection of Entities may be a challenge as well.

— Reply to this email directly or view it on GitHubhttps://github.com/vlucas/Spot/issues/23#issuecomment-10518287.

pnomolos commented 11 years ago

So I was thinking about this last night when I went to bed and woke up this morning with what isn't a perfect solution but will hopefully help significantly. What if we add the option to add 'lazy' => false to a relationship definition, and then, rather than calling $mapper->loadRelations on each entity, you call it across the entire collection? It can continue to lazy-load as it is right now, but for relations that are marked as not lazy-load it could do a bulk load by querying on the primary key field? Might require some additional syntax to indicate what the PK => FK relationship is but it should cut down on the number of queries (which is killing me right now, so I'm looking for a good solution for this :))

vlucas commented 11 years ago

We're closer to this now with the work @pnomolos did by adding with to the Query object. Now at least we can solve the excessive queries issue by eager-loading relations. I would still like to pursue the possibility of doing this completely transparently and automatically, but it may not be possible.