statamic / eloquent-driver

Provides support for storing your Statamic data in a database, rather than flat files.
https://statamic.dev/tips/storing-content-in-a-database
MIT License
104 stars 74 forks source link

Poor performance caused by updateEntryOrder #105

Closed tomhelmer closed 1 year ago

tomhelmer commented 1 year ago

The updateEntryOrder in src/Collections/CollectionRepository.php causes the site to save all items in the collection every time a page is updated.

This really hurts performance to a point where creating/saving/deleting entries does not work at all.

This is especially bad if the site updates a search index or purges a vanish cache on save.

For example a site may have 100 pages. Every time a page is saved the search index in meilli is updated and the tag is purged from varnish. So creating/saving a page will result in 200 http requests and in practice not end before the workers timelimit is reached.

ryanmitchell commented 1 year ago

Hmm maybe we could turn that into a job that you can queue and specify a queue for?

ryanmitchell commented 1 year ago

Opened a PR here: https://github.com/statamic/eloquent-driver/pull/107 - let me know if this fixes the performance issues for you

ryanmitchell commented 1 year ago

@tomhelmer can you let me know if the PR solves the issue for you?

K3CK commented 1 year ago

Just came across this as we are facing a problem with the queue solution. We have a structured page tree with ~3000 pages. Whenever one site is moved to a different parent, thousands of jobs are created effectively clogging the queue. No idea how to solve this issue right now, but maybe there is a more efficient way to do this...?

ryanmitchell commented 1 year ago

@K3CK is this not the point of a queue in that it works through the entry order updates according to your queue config? I would feel like this is already the most 'optimised' way of handling this.

K3CK commented 1 year ago

@ryanmitchell Yes, sure, that's the point of the queue. But I don't understand why all pages of a page tree need to be saved when e.g. only the parent of one page has changed. Because that's what's happening in updateEntryOrder in \Statamic\Eloquent\Collections\CollectionRepository. All of the collection's entries are loaded and put into the queue for saving no matter if they or their parents or children have changed at all.

So imagine an editor who is doing a little restructuring of a page tree and let's say moves 3 pages. In our case with 2000-3000 pages in the tree, we end up with at least 9000 UpdateCollectionEntryOrder jobs. I refuse to accept that this is intended 😉

For me this looks like a bug. Especially because the method's $ids parameter isn't used at all in the method body.

Right now the method looks like this, which loads all the collections entries no matter what:

    public function updateEntryOrder(CollectionContract $collection, $ids = null)
    {
        $collection->queryEntries()
            ->get(['id'])
            ->each(function ($entry) {
                UpdateCollectionEntryOrder::dispatch($entry->id())
                    ->onQueue(config('statamic.eloquent-driver.collections.update_entry_order_queue', 'default'));
            });
    }

I'd suggest to change it to the following utilizing the $ids parameter which I suppose was intended in the first place, although I don't know if NOT saving all pages all the time has some side effects I'm not aware of:

    public function updateEntryOrder(CollectionContract $collection, $ids = null)
    {
        if ($ids) {
            collect($ids)
                ->each(function ($id) {
                    UpdateCollectionEntryOrder::dispatch($id)
                        ->onQueue(config('statamic.eloquent-driver.collections.update_entry_order_queue', 'default'));
                });
        }
    }

That way only the pages which were affected by the latest change are put into the queue for saving (I assume the $ids array only contains the ids of the affected pages).

ryanmitchell commented 1 year ago

The reason we ignore ids is because the Stache store does and we are always reaching for replication of the same behaviour with this driver. See:

https://github.com/statamic/cms/blob/1c287bc5dd89169b19cc550bf2fd94f8aab25969/src/Stache/Stores/CollectionsStore.php#L90C13-L96

I'd happily review a PR for using $ids in there query if it can be shown theres no difference in the data saved.

K3CK commented 1 year ago

Ok, got it. So maybe the first question to clarify would be why the Stache Driver doesn't use the $ids. Maybe re-creating the whole page tree via indexes is faster than doing it for an collection of ids. So I'll see how far I'll get with this...

K3CK commented 1 year ago

Just for the record if someone comes across this issue: Updating only the entries in the $ids is not working like I was expecting. It doesn't take child entries into account, so that you can end up with completely mixed up page structures when moving pages with children.

One thing I realized, though: The UpdateCollectionEntryOrder is doing a save() operation for each entry in the collection, triggering all attached events for every entry. In our case that was the actual cause for the performance issues as the Meilisearch index gets updated (with an HTTP request) on every save. Doing this for 3000 entries adds up, even if it's in the queue.

So I was wondering if it might make sense to provide an option to saveQuietly() the entries, leaving out the attached events. In most cases this should be enough I guess, because effectively the save operation here is only needed to update the entry's order value. So updating the search index should not be needed. And cache clearing could take place once after all saves.

@ryanmitchell Does this makes sense for you? Should I create a PR for that?

ryanmitchell commented 1 year ago

I think a save() is correct as the data has changed and you may want to do something in response. You could bind your own collection repository and overwrite that method and job so it implements save quietly instead?

or we could add a config to allow the job to be specified so you'd only have to change that?

K3CK commented 12 months ago

Thanks @ryanmitchell, we went with the "bind your own repository" solution for now.