silverstripe / silverstripe-search-service

A service-agnostic search module for Silverstripe CMS
BSD 3-Clause "New" or "Revised" License
5 stars 18 forks source link

Unpublishing the parent page does not include the children #88

Closed ssmarco closed 7 months ago

ssmarco commented 1 year ago

In the CMS, unpublishing the parent page also unpublishes the children automatically. However, the same behaviour does not seem to sync when it comes to removing them from Elastic. In this situation, the search results from Elastic will include the supposedly unpublished content. This looks like it affects version 2 and 3 but not tried on v1

Here are steps below to replicate this issue.

image

image

image

chrispenny commented 1 year ago

Thanks, @ssmarco!

Adding a note for when we implement a fix for this:

SiteTree has a config enforce_strict_hierarchy, which is what controls the unpublishing action on the child pages when a parent is unpublished. We should check that this is still set to true before we purge any child pages during this parent page unpublishing action.

GuySartorelli commented 9 months ago

Can confirm that this is an issue - though probably one that should be fixed in framework or versioned instead. I think this module is doing the right thing by performing the "removeFromIndexes" behaviour in its onAfterUnpublish() extension hook.

The problem is really that child pages which get unpublished don't get that extension hook called on them, which is most likely down to the convoluted way cascading publish/unpublish events end up getting performed.

ssmarco commented 9 months ago

Thanks @GuySartorelli Agree with your point. I did some investigation as well during the time of this PR, and I thought that the approach of cascade unpublishing of children pages was for performance reasons hence onAfterUnPublish() was not called on each child page.

michalkleiner commented 9 months ago

The content of the index should basically contain the same pages as a public user can see/navigate to, so we should keep them in sync. I guess during query time there's a canView check done for each result so we won't present unwanted content, but it'd be best if it even wasn't in the index in the first place.

chrispenny commented 9 months ago

I guess during query time there's a canView check done

It depends which search implementation you're using. If you're using (say) the JS libraries supplied by Elastic, then your client side is communicating directly with your Elastic engine, so there's no canView (or any other Silverstripe framework) interaction going on.

We had a similar issue to this with the static publisher module (child pages not having their cache purged if the parent is unpublished). The solution there was to fetch and purge child pages as part of the un-publish that was actioned on the parent. Noting again that this only happens if enforce_strict_hierarchy is set to true.

GuySartorelli commented 9 months ago

Even if you do the search request from PHP and then perform a canView() check before sending results to the frontend, you'll end up with the same problems with pagination that you get elsewhere with Silverstripe CMS (i.e. your page 1 shows results "1-10 of 20", but only 3 pages can be seen because the others fail permission checks)

GuySartorelli commented 9 months ago

I did some investigation as well during the time of this PR, and I thought that the approach of cascade unpublishing of children pages was for performance reasons hence onAfterUnPublish() was not called on each child page.

That's probably accurate, yeah. It'd probably be worth checking what sort of performance hit you'd be taking by calling the hook on each child, but I can't imagine it'd be too bad in most cases.

blueo commented 7 months ago

fixed by https://github.com/silverstripe/silverstripe-search-service/pull/89