studioespresso / craft-scout

Craft Scout provides a simple solution for adding full-text search to your entries. Scout will automatically keep your search indexes in sync with your entries.
MIT License
81 stars 54 forks source link

[STU-130] Split elements not deleted on content update #259

Closed AlexPagnotta closed 8 months ago

AlexPagnotta commented 1 year ago

Hello,

It seems that when updating an entry split into multiple rows (splitElementsOn), the previous rows are only deleted if the new value for the property has 2 elements or more, so basically if null, an empty array, or an array with 1 item is returned for the property on the transformer, the previous rows are not deleted and left as orphans.

I'm currently using the split elements functionality for a rich text field on craft, where each paragraph is split into a new row, and this means that if I currently have a text with 4 paragraphs (so 4 rows on algolia), then I edit the text and remove all the paragraphs, I'm now left with 5 items, a correct one with an empty text, and the previous 4 rows.

STU-130

jasonlav commented 1 year ago

https://github.com/studioespresso/craft-scout/blob/master/README.md#-splitelementsonarray-keys Note the "Important" callout below the method description about facets.

kbergha commented 10 months ago

I'm able to reproduce this issue.

Having an entry split into lets say 3 parts when saving the first time, with an ID of 100.

I get three records in the index:

objectID "100_0" objectID "100_1" objectID "100_2"

All of them with a distinctID of 100.

Going back to the entry, and removing a lot of text, causing it to be just one part, I get 4 records in the index

objectID "100" (with the updated content) objectID "100_0" (with the old content) objectID "100_1" (with the old content) objectID "100_2" (with the old content)

All of them with a distinctID of 100, and all searchable.

_0, _1 and _2 should have been deleted before adding the updated data/record.

I think this is related to the logic on line 37 to 48 in Engine.php.

Due to the continue statement in there, the $objectsToDelete array will be always empty if the number of "splits" currently is <= 1, leading to the orphans in the index.

The index I use has:

distinct: true
attributeForDistinct: distinctID
attributesForFaceting: distinctID (+ others)
janhenckens commented 9 months ago

I've tagged this in 3.3.3-beta.2 - since we already had a beta running to fix ongoing issues with deletions not going through in some cases (with split elements as well).

Could you give this a try and see if it fixes things?

kbergha commented 9 months ago

Looks to be working as expected for me with 3.3.3-beta.2 (and Craft 4.7.3)

Do you want to wait for @AlexPagnotta to test as well before resolving/closing?