statamic / eloquent-driver

Provides support for storing your Statamic data in a database, rather than flat files.
https://statamic.dev/tips/storing-content-in-a-database
MIT License
108 stars 76 forks source link

Optimize taxonomy queries. #373

Open TheBnl opened 1 day ago

TheBnl commented 1 day ago

Description

Writing this here to check if this is wanted behaviour or not. The issue I was facing is that on a project with 100+ tags I saw the total amount of page queries reflecting that amount. Diving deeper into where these queries where coming from I traced it back to the TermQueryBuilder. In the method applyCollectionAndTaxonomyWheres that is used when retrieving page taxonomies, a map is run over all taxonomies to check when they are used on the page. This creates, in my case, 134 count queries being fired on every page request. The result is only cached in the blink request cache.

The current query:

$query = TermModel::where('taxonomy', $taxonomy)
    ->select('slug')
    ->get()
    ->map(function ($term) use ($taxonomy) {
        return Entry::query()
            ->whereIn('collection', $this->collections)
            ->whereJsonContains('data->'.$taxonomy, [$term->slug])
            ->count() > 0 ? $term->slug : null;
    })
    ->filter()
    ->values();

The reason is, probably, that someone can configure the Entries to be stored with the file driver.

My suggestion would be to check what entry driver is configured and based on that opt for a join to come to the result reducing the 133 queries to 1.

Suggestion:

if (config('statamic.eloquent-driver.entries.driver', 'file') == 'eloquent') {
    $query = TermModel::where('taxonomy', $taxonomy)
        ->join('entries', function (JoinClause $join) use ($taxonomy) {
            $join->on('entries.collection', '=', 'entries.collection')
                ->where('entries.collection', $this->collections)
                ->whereRaw("json_contains(`entries`.`data`, concat('[\"',`taxonomy_terms`.`slug`,'\"]'), '$.\"$taxonomy\"')");
        })
        ->select('taxonomy_terms.slug')
        ->get();
} else {
    $query = TermModel::where('taxonomy', $taxonomy)
        ->select('slug')
        ->get()
        ->map(function ($term) use ($taxonomy) {
            return Entry::query()
                ->whereIn('collection', $this->collections)
                ->whereJsonContains('data->'.$taxonomy, [$term->slug])
                ->count() > 0 ? $term->slug : null;
        })
        ->filter()
        ->values();
}

I must admit that this join is a bit funky, especially the on statement 'entries.collection', '=', 'entries.collection' but it gets the job done. Definitely open to suggestions to optimize or replace it with something better. But in the end the N+1 query issue could definitely be solved.

ryanmitchell commented 1 day ago

I agree the query could be improved but I'm not sure this is the right approach, as it's too opinionated. For example, what if someone has their own repository they are using thats not files or eloquent, or if they have bound their own models.

I would like to see pivot table between terms and entries - that would be the right way to fix this I feel.

TheBnl commented 3 hours ago

True, it's definitely an opinionated take. And I also agree that a pivot would be the way to approach this, but that would probably mean a bigger overhaul of how the taxonomies are handled.

In defence of this approach tho, it does only activate when the driver is set to eloquent, so other custom drivers would still be using the current setup. And as eloquent is the driver supplied by this module, it makes sense to include it.

Would be nice if a custom driver would be able to hook into this method, or the whole class, so they can create the fetching methods best fitting there setup.

For people who stumble on the same situation i have a patch file: optimize-taxonomies-query.patch

ryanmitchell commented 3 hours ago

Your current code only works for your specific taxonomy names, it would need to be made generic. It's possible to do, but as I said it's not the right approach here.