silverstripe / silverstripe-fulltextsearch

Adds external full text search engine support to Silverstripe
BSD 3-Clause "New" or "Revised" License
44 stars 83 forks source link

Results not showing after editing pages (no soft-commit nor core reload issued) #274

Closed mateusz closed 2 years ago

mateusz commented 4 years ago

We are seeing people having problems with their results not showing in search, after they have updated their content. The root cause of this issue is that solr index must be reloaded OR a soft-commit must happen for the results to show up. The module seems to do neither, nor it gives recommendations on how to deal with that.

Historically, soft-commit would have been triggered from queuedjobs (or core reload, can't remember). Because queuedjobs were unreliable, CWP was switched behind the scenes to hijack solrconfig.xml, and force autoSoftCommit to 60s, which mean the results showed reliably after saving.

I think this knowledge might have been forgotten and the module now doesn't trigger the soft commit or core reload anymore, nor even documents the necessity of committing. This means it works in CWP, but seemingly doesn't on other platforms.

I think the default autoSoftCommit should perhaps be changed away from -1 to something like 15000.

Maybe some documentation around handling committing could be supplied too? I can't figure out how this is supposed to work with soft-commits off by default - is there a job to run? :-) some help would be appreciated.

We can try fixing this magically on the platform side by setting the solr.autoSoftCommit.maxTime java var, but I think this will still perhaps confuse people who don't use that particular platform, so some mention in the gotchas, or docs around it would be wonderful.

adrexia commented 4 years ago

I don't personally know of a reason the default for autoSoftCommit should stay at -1. There are jobs ( 1 & 2 ) that can be run, but they seemingly also do not successfully commit the changes.

unclecheese commented 4 years ago

Agree. I think bumping the autoSoftCommit time up (or enabling it in some other way) makes sense. The only reason it would be disabled in my mind is if its responsibility was being handled by some other means. We should assign a proper value to that setting, and document what that time is, so that consumers of the module have realistic expectations of when they'll see their changes.

chillu commented 4 years ago

Just tracing back steps a bit, here's what the configuration docs say:

Publish a page in the CMS [...] This tracks changes to the database, so any alterations will trigger a reindex. In order to minimise delays to those users, the index update is deferred until after the actual request returns to the user, through PHP's register_shutdown_function() functionality. [...] Queued jobs If the Queued Jobs module is installed, updates are queued up instead of executed in the same request. Queued jobs are usually processed every minute. Large index updates will be batched into multiple queued jobs to ensure a job can run to completion within common constraints, such as memory and execution time limits. Solr Reindex [...] If you have the Queued Jobs module installed, then this task will create multiple reindex jobs that are processed asynchronously; unless you are in dev mode, in which case the index will be processed immediately (see processor.yml). Otherwise, it will run in one process. Often, if you are running it via the web, the request will time out. Usually this means the actually process is still running in the background, but it can be alarming to the user, so bear that in mind.

CWP docs say:

CWP's Solr server ignores all search index commit requests, and instead relies on auto-commits to update indexes. This preserves stability for all users of the shared service. This will manifest as index updates taking a minute or two to appear in the search results, while on local development environment they are immediate.

So following the docs, we should create jobs for both update and commit when the module is installed, by default. That's broken because a SearchUpdateProcessor instance has been replaced during the 3>4, effectively hardwiring it to SearchUpdateImmediateProcessor instead of using Injector to optionally use SearchUpdateQueuedJobProcessor.

Here's a post explaining autoSoftCommit.maxTime=-1. And one explaining the difference between soft and hard commits.

Constraints from my perspective:

My gut feel is to restore the intended solution here (run jobs for update and commit), which seems like it would be achieved through Naomi's PR. If we change the commit configuration, let's validate that against the constraints above - predominantly in the platforms where we have that level of visibility.

adrexia commented 4 years ago

@chillu unfortuantely, my PR alone does not fix this problem on Platform. We have it set up and running there - jobs are created and look to be successful - but we still have the issue of the indexes not being properly committed until a full reindex is run.

mateusz commented 4 years ago

@adrexia @chillu there is a difference between a (hard)commit, softCommit and core reload. The former does not get the results updated, only the latter two. Platform had hard-commit configured, but that just flushes to disk. You need to reload the core (which is what Solr_Configure does, or soft-commit (not sure if there is an API for that?).

From platform performance perspective, soft commits are probably the best of both worlds - setting those to 15-60s doesn't have any visible impact, and can even be a net-positive thing if it helps limit hard-commits (which flush onto disk) and core restarts (which can be resource-intensive for big cores, or so I think).

I'm not sure if soft-commits can be triggered via API. Solrconfig.xml allows you to make those commits automatic (so you don't have to make an API call). Pretty much means the ticker starts at the point of index update, and triggers commit at timeout.

CWP currently has autoSoftCommit=60000 (60s) and autoCommit=300000 (300s).

mateusz commented 4 years ago

I guess one more thing to keep in mind is soft-commits might result in different index contents compared to core reload and also compared to full reindexes. I haven't heard anything specific around that though from CWP perspective, and that has been using autoSoftCommits for ~5yrs, so should be fine for casual use?

mateusz commented 4 years ago

So could someone maybe at least suggest in the docs how to customise solrconfig.xml?

adrexia commented 4 years ago

I'm keen to get the default changed, as its basically broken from the perspective of (I think) most of this module's users outside a cwp environment. I could document the how of customising solrconfig.xml, but I'm still not entirely clear on the reasons why you might want to customise the autoSoftCommit1 if we change the default (other then the more general desire to customise the extras configurations).

I think both the SearchUpdateImmediateProcessor and the SearchUpdateQueuedJobProcessor rely on autoSoftCommits not being disabled. In the very least, changing the autoSoftCommit value appears to be the way to get the queued jobs working properly. I'm unsure if the functionality around publish object->update index has ever worked with Solr 4? It's the sort of thing that people might not notice straight away2.

@chillu, @unclecheese - what are your thoughts?


1. What are the effects on the server if its 1 minute, 5 minutes, or 30 seconds? Are there any? What are the reasons to disable? 2. Which is apparent from the fact the queued jobs functionality has been broken since the Silverstripe 4 upgrade.

chillu commented 4 years ago

We want less devs customising solrconfig.xml rather than more of them.

I'm unsure if the functionality around publish object->update index has ever worked with Solr 4? It's the sort of thing that people might not notice straight away

It does work as long as autoSoftCommit is enabled, although with the delay configured there. I've installed fulltextsearch-localsolr on cwp/installer:2.5.x-dev, with the latest silverstripe/fulltextsearch:3.x-dev (incl. your fix). With the default config of autoSoftCommit.maxtime:-1, so effectively disabled. Published a page, ensured the queue ran through, and the new content was available for searching in the index after 15000ms (the "hard commit" threshold). I've stepped my way through with breakpoints, and that's the case after only calling <add> commands in Solr (without any explicit <commit>). So the results were available for new search requests without ever calling commits afterwards, because it actually opened a new "searcher", auto-warmed it, and then put it in service for the next search request (see logs). That's mystifying to me, since autoCommit.openSearcher:false. but I think it's somewhere around the behaviour of maxWarmingSearchers.

openSearcher is described as follows:

if false, the commit causes recent index changes to be flushed to stable storage, but does not cause a new searcher to be opened to make those changes visible.

I haven't gotten to the bottom of this, but it seems likely that Solr just tries to be helpful here and makes the new results available (see https://issues.apache.org/jira/browse/SOLR-5783 for some insights in how complex that decision making is). In conclusion, I can't reproduce the issue locally, but after reading about "soft commits" I also don't see the harm in setting autoSoftCommit to the same configuration in the module that's worked for us for many years in CWP (and effectively enabling it in SC for anyone updating the module). Even with autoSoftCommit, keeping a separate SearchUpdateCommitJobProcessor job makes sense because that might trigger Solr to commit faster than either through it's own heuristics, or through the autoCommit and autoSoftCommit maxTime settings.

I've created a PR at https://github.com/silverstripe/silverstripe-fulltextsearch/pull/278, haven't succeeded in getting search results on an SC testing box yet though.

emteknetnz commented 2 years ago

Linked PR has been tested and merged and released as 3.11.0, closing now