Open jrunestone opened 1 year ago
Hi there @jrunestone!
Firstly, a big thank you for raising this issue. Every piece of feedback we receive helps us to make Umbraco better.
We really appreciate your patience while we wait for our team to have a look at this but we wanted to let you know that we see this and share with you the plan for what comes next.
We wish we could work with everyone directly and assess your issue immediately but we're in the fortunate position of having lots of contributions to work with and only a few humans who are able to do it. We are making progress though and in the meantime, we will keep you in the loop and let you know when we have any questions.
Thanks, from your friendly Umbraco GitHub bot :robot: :slightly_smiling_face:
Hi @jrunestone,
Thanks for reaching out π
Your use case makes perfect sense. And you're absolutely right; importing external content into the content tree for no other reason than "having them available" is definitively not the way to go.
There are a few ways to work around (or limit) the downtime you're currently experiencing:
That being said, I would be very interested in seeing a PR for this. However, it would probably be a good idea to discuss the approach, before you start pouring hours into building something. A "true" zero downtime solution would be to rebuild the index on the side and swap it somehow (like you would with Elastic). I doubt that's doable with Lucene, though.
I'll pass this to the team for internal discussion. We'll get back to you here π
Thanks for the suggestions!
1: Unfortunately we still need to fetch everything first to be able to tell which items need crudding, the actual rebuilding of the index is blazing fast.
2: Is a good workaround, however I already implemented the workaround I mentioned where I override CreateIndex and call it myself in the populator when it's ready :P Not future proof but less code right now to solve the issue anyway.
Looking forward to hearing what the team has to say. :+1:
Hi @jrunestone
We discussed it internally and your "pre-fetch" method at least seems better than what we have today. Our biggest concern is memory usage, if you have to save all data in memory while building the index. So we think it will be good as an opt-in feature.
Which Umbraco version are you using? (Please write the exact version, example: 10.1.0)
11.4.1
Bug summary
When Umbraco calls upon an index populator to rebuild an index it first clears the index and then calls the populator, leaving the front facing site empty of items where it's querying this index. This becomes a big problem if you have an index populator that takes a bit of time fetching and preparing the items to insert, or if the populator fails.
Specifics
Our use case might not fit into the intended behavior of custom search indexes, or maybe it does, this is how we tend to solve things on other platforms:
The problem arises when we need to rebuild the index with new content. Umbraco first clears the index and then calls the populator's sole method to fetch and insert items into the index. If it takes a minute, our page displaying the content from the index is completely blank until the index is rebuilt. If the populator fails, the index is left empty.
The API for this particular part of the code consists of a few private methods unfortunately so it's quite hard to roll your own. I feel it's redundant to have a cache layer on top of this index, since that's basically what the index is already.
It would be prudent to include a "pre-fetch" method on the populator, or an option to handle the clearing of the index within the populator itself. That way we could spend a minute, 30s or whatever to fetch our items and then quickly rebuild the index. The built-in content index populator does pagination runs to index the items a few at a time but we're unable to do this. My temporary fix is to override
CreateIndex
and noop it, and then in my populator call a new method on the index classDeferredCreateIndex
(or whatever) right before we index the fetched items which callsbase.CreateIndex
.If I submitted a PR for this would it be considered? I'm sure it could be made non-breaking/backwards-compatible (opt-in).
Steps to reproduce
PopulateIndexes
method before filling it with dummy contentExpected result / actual result
No response