studioespresso / craft-scout

Craft Scout provides a simple solution for adding full-text search to your entries. Scout will automatically keep your search indexes in sync with your entries.
MIT License
81 stars 54 forks source link

[STU-85] Entry deindexing soft failing #281

Closed ten1seven closed 9 months ago

ten1seven commented 1 year ago

I recently reported on https://github.com/studioespresso/craft-scout/issues/276 and now that I have that fix in place I'm observing a new issue:

Running the queue in the command line, I see this output that doesn't seem to indicate that anything has gone wrong.

Site stats PHP: 8.1.22 Craft: 4.4.17 (recently upgraded from Craft 3) Scout: 3.2.1 Running multi-site

Is it possible that this is related to multi-site similar to https://github.com/studioespresso/craft-scout/issues/276? What other debugging information can I provide?

One of my developers looked into this and noticed

One of my developers noticed $this->siteId in [IndexElement](https://github.com/studioespresso/craft-scout/blob/master/src/jobs/IndexElement.php#L21):

$element = Craft::$app->getElements()->getElementById($this->id, null, $this->siteId);

But in [DeindexElement](https://github.com/studioespresso/craft-scout/blob/master/src/jobs/DeindexElement.php#L16) it's missing:

$element = Craft::$app->getElements()->getElementById($this->id, null, null, [
   'trashed' => null,
]);

From SyncLinear.com | STU-85

janhenckens commented 1 year ago

Thanks for reporting that @ten1seven, I completely forgot to add the siteId parameter to the DeindexJob as well. Fix for this should be other later today!

janhenckens commented 1 year ago

The fix for this is released in 3.3.1

ten1seven commented 1 year ago

Hi @janhenckens, I pulled 3.3.1 into my application and I'm still having issues with entries not getting removed from my Algolia index when they're deleted from Craft. Are there some additional things you'd suggest I take a look at? Or is there some debugging data I could send your way? Thanks!

janhenckens commented 1 year ago

@ten1seven Could you share your scout.php config file, and the general setup of the craft install in questions (which section/index has the issue? in a specific site, etc)

ten1seven commented 1 year ago

Sure thing @janhenckens

I have been seeing the issue and doing my testing on the News entry type but it's not isolated to just that entry type (I haven't tested them all but could if that would be helpful).

We're running 46 sites on this multi-site install but only two (the main one and the Spanish-translated version--hrc and hrcES) have indexes at Algolia. We have the propagation method set to "Save entries to other sites in the same site group" (screenshot below).

Since the Craft 4 upgrade, we had been having issues with saving and deleting entries not propagating over to Algolia. After your fix on https://github.com/studioespresso/craft-scout/issues/276 saving changes seems to now be working but deleting an entry and it not getting removed from Algolia is still the issue.

Craft: 4.4.17 PHP: 8.1.21 Database: MySQL 8.0.33

Plugins: Amazon S3 2.0.3 Asset Rev 7.0.0 Expanded Singles 2.0.5 Feed Me 5.2.0 Imgix 3.0.0 Maps 4.0.4 MatrixMate 2.1.3 Redactor 3.0.4 Related v2.x-dev SEO 4.2.0 Scout 3.3.1 Sentry SDK 2.0.1 Super Table 3.0.9 Table Maker 4.0.6 Template Comments 4.0.0

config/scout.php

<?php

use modules\hrc\web\services\Search;

return [
    /*
     * Scout listens to numerous Element events to keep them updated in
     * their respective indices. You can disable these and update
     * your indices manually using the commands.
     */
    'sync' => true,

    /*
     *
     * @depcretio
     * By default Scout handles all indexing in a queued job, you can disable
     * this so the indices are updated as soon as the elements are updated
     *
     * Disabling the `queue` option will no longer be supported in the next version of Scout
     *
     */
    'queue' => true,

    /*
     * The connection timeout (in seconds), increase this only if necessary
     */
    'connect_timeout' => 1,

    /*
     * The batch size Scout uses when importing a large amount of elements
     */
    'batch_size' => 1000,

    /*
     * By default Scout will index elements related to the element being save (that are in the same index).
     * Disabling this can improve performance on larger sites that have lots of relations.
     */
    'indexRelations' => true,

    /*
     * The Algolia Application ID, this id can be found in your Algolia Account
     * https://www.algolia.com/api-keys. This id is used to update records.
     */
    'application_id' => '###',

    /*
     * The Algolia Admin API key, this key can be found in your Algolia Account
     * https://www.algolia.com/api-keys. This key is used to update records.
     */
    'admin_api_key' => '###',

    /*
     * The Algolia search API key, this key can be found in your Algolia Account
     * https://www.algolia.com/api-keys. This search key is not used in Scout
     * but can be used through the Scout variable in your template files.
     */
    'search_api_key' => '###',

    /*
     * A collection of indices that Scout should sync to, these can be configured
     * by using the \rias\scout\ScoutIndex::create('IndexName') command. Each
     * index should define an ElementType, criteria and a transformer.
     */
    'indices' => Search::getIndices(),
];

The Search class with the getIndices method

<?php

namespace modules\hrc\web\services;

use craft\elements\db\EntryQuery;
use craft\elements\Entry;
use Exception;
use modules\hrc\web\models\SearchTransformer;
use modules\hrc\web\models\SearchTransformerEvents;
use modules\hrc\web\models\SearchTransformerNewsPress;
use rias\scout\IndexSettings;
use rias\scout\ScoutIndex;

class Search
{
    public const SECTION_NEWS = 'news';
    public const SECTION_PRESS_RELEASES = 'pressReleases';
    public const SECTION_RESOURCES = 'resources';
    public const SECTION_EVENTS = 'events';
    public const SECTION_PAGES = 'pages';
    public const SECTION_CAMPAIGNS = 'campaigns';
    public const SECTION_STORIES = 'stories';
    public const SECTION_EQUALITY_INDEXES = 'equalityIndexes';
    public const SECTION_DATA_EXPLORERS = 'dataExplorers';
    public const SECTION_STAFF = 'staff';
    public const SECTION_RESOURCE_TOPICS = 'resourceTopics';
    public const SECTION_MAGAZINE_ARTICLES = 'magazineArticles';

    public const SECTION_FOUNDATION_PAGES = 'foundationPages';
    public const SECTION_FOUNDATION_PROGRAMS = 'programs';
    public const SECTION_FOUNDATION_PROFESSIONAL_RESOURCES = 'professionalResources';

    /**
     * Creates an index name based on the current CRAFT_ENVIRONMENT
     *
     * @param string $name
     * @return string
     */
    public static function getIndexName(string $name): string
    {
        return implode('_', [
            CRAFT_ENVIRONMENT,
            $name,
        ]);
    }

    /**
     * Configure the Scout indices and data for Algolia
     *
     * @return array
     * @throws Exception
     */
    public static function getIndices(): array
    {
        $indices = [];

        $siteSearchConfig = self::_getSiteSearchConfig();

        $hrcIndex = self::getIndexName('hrcSearch');
        $indices[] = ScoutIndex::create($hrcIndex)
            ->elementType(Entry::class)
            ->criteria(function (EntryQuery $query) use ($siteSearchConfig) {
                return $query->section([
                    array_keys($siteSearchConfig)
                ])
                    ->site([
                        'hrc',
                        'hrcES'
                    ]);
            })
            ->transformer(function (Entry $entry) use ($siteSearchConfig) {
                return self::_getTransformer($siteSearchConfig, $entry);
            })
            ->splitElementsOn([
                'content',
            ])
            ->indexSettings(IndexSettings::create()
                ->hitsPerPage(10)
                ->attributeForDistinct('distinctID')
                ->distinct(true)
                ->attributesForFaceting([
                    'distinctID',
                    'type',
                    'topics',
                    'locations',
                    'year',
                ])
                ->replicas([
                    $hrcIndex . '_dateAsc',
                    $hrcIndex . '_dateDesc',
                    $hrcIndex . '_popularity',
                ])
            );

        $foundationSearchConfig = self::_getFoundationSearchConfig();

        $hrcFoundationIndex = self::getIndexName('hrcFoundationSearch');
        $indices[] = ScoutIndex::create($hrcFoundationIndex)
            ->elementType(Entry::class)
            ->criteria(function (EntryQuery $query) use ($foundationSearchConfig) {
                return $query->section(
                    array_keys($foundationSearchConfig)
                )
                    ->site([
                        'hrcFoundation',
                    ]);
            })
            ->transformer(function (Entry $entry) use ($foundationSearchConfig) {
                return self::_getTransformer($foundationSearchConfig, $entry);
            })
            ->splitElementsOn([
                'content',
            ])
            ->indexSettings(IndexSettings::create()
                ->hitsPerPage(10)
                ->attributeForDistinct('distinctID')
                ->distinct(true)
                ->attributesForFaceting([
                    'distinctID',
                    'type',
                    'topics',
                    'locations',
                    'year',
                ])
                ->replicas([
                    $hrcFoundationIndex . '_dateAsc',
                    $hrcFoundationIndex . '_dateDesc',
                ])
            );

        return $indices;
    }

    /**
     * @param array $config
     * @param Entry $entry
     * @return array
     * @throws Exception
     */
    private static function _getTransformer(array $config, Entry $entry): array
    {
        $transformer = $config[$entry->section->handle];

        if ($transformer instanceof SearchTransformer === false) {
            throw new Exception('Transformer config must be of type SearchTransformer');
        }

        return $transformer->getEntryData($entry);
    }

    private static function _getSiteSearchConfig(): array
    {
        return [
            self::SECTION_NEWS => new SearchTransformerNewsPress([
                'type' => 'News',
                'topicFieldHandle' => 'newsTopics',
                'useTopicFacet' => true,
                'useLocationFacet' => true,
                'useYearFacet' => true,
            ]),
            self::SECTION_PRESS_RELEASES => new SearchTransformerNewsPress([
                'type' => 'Press Releases',
                'topicFieldHandle' => 'newsTopics',
                'useTopicFacet' => true,
                'useLocationFacet' => true,
                'useYearFacet' => true,
            ]),
            self::SECTION_RESOURCES => new SearchTransformer([
                'type' => 'Resources',
                'topicFieldHandle' => 'resourceTopics',
                'useTopicFacet' => true,
            ]),
            self::SECTION_EVENTS => new SearchTransformerEvents([
                'type' => 'Events',
                'useLocationFacet' => true,
                'useYearFacet' => true,
            ]),
            self::SECTION_CAMPAIGNS => new SearchTransformer([
                'type' => 'Campaigns',
                'useYearFacet' => true,
            ]),
            self::SECTION_STORIES => new SearchTransformer([
                'type' => 'Campaigns',
                'useYearFacet' => true,
            ]),
            self::SECTION_MAGAZINE_ARTICLES => new SearchTransformer([
                'type' => 'Magazine',
                'useYearFacet' => true,
            ]),
            self::SECTION_EQUALITY_INDEXES => new SearchTransformer([
                'eyebrow' => 'Equality Index'
            ]),
            self::SECTION_DATA_EXPLORERS => new SearchTransformer(),
            self::SECTION_STAFF => new SearchTransformer(),
            self::SECTION_RESOURCE_TOPICS => new SearchTransformer(),
            self::SECTION_PAGES => new SearchTransformer(),
        ];
    }

    private static function _getFoundationSearchConfig(): array
    {
        return [
            self::SECTION_FOUNDATION_PAGES => new SearchTransformer([
                'type' => 'Pages',
            ]),
            self::SECTION_FOUNDATION_PROGRAMS => new SearchTransformer([
                'type' => 'Programs',
            ]),
            self::SECTION_FOUNDATION_PROFESSIONAL_RESOURCES => new SearchTransformer([
                'topicFieldHandle' => 'professionalAudiences',
                'type' => 'Professional Resources',
            ]),
        ];
    }
}

CleanShot 2023-11-16 at 13 39 57@2x

janhenckens commented 11 months ago

Hey @ten1seven,

I have a first attempt for a fix out but since I'm not entirely sure how this will behave on live Craft installs, I tagged it as a beta. If you're willing to give the fix a try, could you update to 3.3.3-beta.1 and see if that fixes the issue?

ten1seven commented 11 months ago

Hey @janhenckens I installed the beta and tested creating an entry (synced to Algolia), editing the entry (synced to Algolia), and deleting the entry--unfortunately, it still did not get removed from the Algolia index 😢 Thanks for digging into this. Let me know if I can help with additional details.

janhenckens commented 11 months ago

I see you're also using splitElementsOn, let me try with that enabled.

janhenckens commented 11 months ago

Hey @ten1seven, could you double check that you've taken this note from the docs into account?

Important - distinctID (available after indexing) must be set up as an attribute for faceting for deletion of objects to work when using splitElementsOn.

I've tested without the facet (deletion fails) and with the facet (deletion works) on my end.

ten1seven commented 11 months ago

@janhenckens up in the getIndices() function that I shared we've got distinctID as part of attributesForFaceting. Is that what you're referring to?

->indexSettings(IndexSettings::create()
    ->hitsPerPage(10)
    ->attributeForDistinct('distinctID')
    ->distinct(true)
    ->attributesForFaceting([
        'distinctID',
        'type',
        'topics',
        'locations',
        'year',
    ])
    ->replicas([
        $hrcIndex . '_dateAsc',
        $hrcIndex . '_dateDesc',
        $hrcIndex . '_popularity',
    ])
);
janhenckens commented 11 months ago

@janhenckens up in the getIndices() function that I shared we've got distinctID as part of attributesForFaceting. Is that what you're referring to?

Yep, missed that in your config. Could you make sure those are being synced to Algolia? (that doesn't happen when saving entries)

ten1seven commented 11 months ago

@janhenckens is this what you were looking for?

CleanShot 2023-12-08 at 14 40 34

janhenckens commented 11 months ago

@ten1seven Yep, and that seems correct too.

Are you removing the entry from hrcFoundation (which is in an index by itself) or from one of the other 2?

ten1seven commented 11 months ago

There is no shared content between the sites and I can confirm that entries for the foundation index have the same behavior (creating/editing syncs to Algolia but deleting does not). Can you confirm what you mean by "are you removing"? We don't have any code doing anything related to deleting entries in Scout/Algolia--we rely on Scout for that. Are you seeing anything in the code I shared that could be interacting with that?

janhenckens commented 11 months ago

Hey @ten1seven, by "removing" I was referring to deleting the entry in Craft.

Your config seems ok from my end thus far. Before the change I made in the 3.3.3-beta.1 I have seeing the same issue of entries not being removed from the index, but with that fix in place I'm unable to reproduce it. With splitElementsOn on or not, both work on my end right now.

Since this sounds like a pretty large website so I assume you're not able to share the database & composer files?

janhenckens commented 9 months ago

After merging https://github.com/studioespresso/craft-scout/pull/289, I tagged 3.3.3-beta.2 to include an extra fix - @ten1seven could you give this a try and report back if that fixes the issue for you?

ten1seven commented 9 months ago

Thanks @janhenckens I'll pass this along to our developer who picked up this task on our end.

maxdmyers commented 9 months ago

@janhenckens we were able to solve this issue on our end. It was due to the objectID format that we are using. We use {elementID}_{siteHandle} and when deleting, Scout is only looking for $element->id. As a workaround, I've added the below code to a module and it's working great.

Event::on(
    Elements::class,
    Elements::EVENT_AFTER_DELETE_ELEMENT,
    function (ElementEvent $event) {
        $element = $event->element;
        $index = Craft::$container->get(SearchClient::class)
            ->initIndex(App::env('CRAFT_ENVIRONMENT') . '_' . 'hrcSearch');
        $objectIDs = [
            $element->id . '_' . 'hrc',
            $element->id . '_' . 'hrcES',
        ];

        $index->deleteObjects($objectIDs);
    }
);
janhenckens commented 9 months ago

@ten1seven @maxdmyers Yeah, scout doesn't take custom objectID's into account at the moment. I'll close this issue for now and release the object out of beta.

Feel free to reach out should you run into any other issues!