umbraco / Umbraco-CMS

Umbraco is a free and open source .NET content management system helping you deliver delightful digital experiences.
https://umbraco.com
MIT License
4.49k stars 2.69k forks source link

Indexes are emptied before data is populated, affecting live site #14471

Open jrunestone opened 1 year ago

jrunestone commented 1 year ago

Which Umbraco version are you using? (Please write the exact version, example: 10.1.0)

11.4.1

Bug summary

When Umbraco calls upon an index populator to rebuild an index it first clears the index and then calls the populator, leaving the front facing site empty of items where it's querying this index. This becomes a big problem if you have an index populator that takes a bit of time fetching and preparing the items to insert, or if the populator fails.

Specifics

Our use case might not fit into the intended behavior of custom search indexes, or maybe it does, this is how we tend to solve things on other platforms:

The problem arises when we need to rebuild the index with new content. Umbraco first clears the index and then calls the populator's sole method to fetch and insert items into the index. If it takes a minute, our page displaying the content from the index is completely blank until the index is rebuilt. If the populator fails, the index is left empty.

image

The API for this particular part of the code consists of a few private methods unfortunately so it's quite hard to roll your own. I feel it's redundant to have a cache layer on top of this index, since that's basically what the index is already.

It would be prudent to include a "pre-fetch" method on the populator, or an option to handle the clearing of the index within the populator itself. That way we could spend a minute, 30s or whatever to fetch our items and then quickly rebuild the index. The built-in content index populator does pagination runs to index the items a few at a time but we're unable to do this. My temporary fix is to override CreateIndex and noop it, and then in my populator call a new method on the index class DeferredCreateIndex (or whatever) right before we index the fetched items which calls base.CreateIndex.

If I submitted a PR for this would it be considered? I'm sure it could be made non-breaking/backwards-compatible (opt-in).

Steps to reproduce

  1. Create a custom index
  2. Create an index populator for this index
  3. Sleep for some time in the PopulateIndexes method before filling it with dummy content
  4. Observe the index being empty while rebuilding

Expected result / actual result

No response

github-actions[bot] commented 1 year ago

Hi there @jrunestone!

Firstly, a big thank you for raising this issue. Every piece of feedback we receive helps us to make Umbraco better.

We really appreciate your patience while we wait for our team to have a look at this but we wanted to let you know that we see this and share with you the plan for what comes next.

We wish we could work with everyone directly and assess your issue immediately but we're in the fortunate position of having lots of contributions to work with and only a few humans who are able to do it. We are making progress though and in the meantime, we will keep you in the loop and let you know when we have any questions.

Thanks, from your friendly Umbraco GitHub bot :robot: :slightly_smiling_face:

kjac commented 1 year ago

Hi @jrunestone,

Thanks for reaching out πŸ˜„

Your use case makes perfect sense. And you're absolutely right; importing external content into the content tree for no other reason than "having them available" is definitively not the way to go.

There are a few ways to work around (or limit) the downtime you're currently experiencing:

That being said, I would be very interested in seeing a PR for this. However, it would probably be a good idea to discuss the approach, before you start pouring hours into building something. A "true" zero downtime solution would be to rebuild the index on the side and swap it somehow (like you would with Elastic). I doubt that's doable with Lucene, though.

I'll pass this to the team for internal discussion. We'll get back to you here πŸ‘

jrunestone commented 1 year ago

Thanks for the suggestions!

1: Unfortunately we still need to fetch everything first to be able to tell which items need crudding, the actual rebuilding of the index is blazing fast.

2: Is a good workaround, however I already implemented the workaround I mentioned where I override CreateIndex and call it myself in the populator when it's ready :P Not future proof but less code right now to solve the issue anyway.

Looking forward to hearing what the team has to say. :+1:

bergmania commented 1 year ago

Hi @jrunestone

We discussed it internally and your "pre-fetch" method at least seems better than what we have today. Our biggest concern is memory usage, if you have to save all data in memory while building the index. So we think it will be good as an opt-in feature.