umbraco / Umbraco-CMS

Umbraco is a free and open source .NET content management system helping you deliver delightful digital experiences.
https://umbraco.com
MIT License
4.49k stars 2.69k forks source link

ExamineIndexRebuilder throws errors sometimes #13804

Open Zeegaan opened 1 year ago

Zeegaan commented 1 year ago

Which Umbraco version are you using? (Please write the exact version, example: 10.1.0)

10+

Bug summary

This happens because we try to use scope in background threads, so the scope can sometimes not be the ambient scope when we try and dispose it.

Specifics

Line 93 in ExamineIndexRebuilder.cs

_backgroundTaskQueue.QueueBackgroundWorkItem(
                cancellationToken =>
                {
                    // This is a fire/forget task spawned by the background thread queue (which means we
                    // don't need to worry about ExecutionContext flowing).
                    Task.Run(() => RebuildIndexes(onlyEmptyIndexes, delay.Value, cancellationToken));

                    // immediately return so the queue isn't waiting.
                    return Task.CompletedTask;
                });

This ends up using ContentIndexPopulator which in turn uses the IContentService to GetPagedDescendants which creates a scope on line 789.

Steps to reproduce

This was discovered during hardcore testing, but should be able to reproduce to force index rebuilding, by saving lots of content simultaneously.

Expected result / actual result

To not get errors when the examine index is rebuilding.


_This item has been added to our backlog AB#26699_

devantler commented 1 year ago

@Zeegaan Added the stack error from which we located the issue.

[10:20:02 ERR] Index populating failed for populator Umbraco.Cms.Infrastructure.Examine.PublishedContentIndexPopulator
System.InvalidOperationException: The Scope 86df2890-8624-4b7c-bf60-4ec2bf27b415 being disposed is not the Ambient Scope 01a42a5d-b7aa-4084-a4b5-2d930a76ff2e. This typically indicates that a child Scope was not disposed, or flowed to a child thread that was not awaited, or concurrent threads are accessing the same Scope (Ambient context) which is not supported. If using Task.Run (or similar) as a fire and forget tasks or to run threads in parallel you must suppress execution context flow with ExecutionContext.SuppressFlow() and ExecutionContext.RestoreFlow().
   at Umbraco.Cms.Infrastructure.Scoping.Scope.Dispose()
   at Umbraco.Cms.Infrastructure.Examine.ContentValueSetBuilder.GetValueSets(IContent[] content)
   at Umbraco.Cms.Infrastructure.Examine.ContentIndexPopulator.IndexPublishedContent(Int32 contentParentId, Int32 pageIndex, Int32 pageSize, IReadOnlyList`1 indexes)
   at Umbraco.Cms.Infrastructure.Examine.ContentIndexPopulator.PopulateIndexes(IReadOnlyList`1 indexes)
   at Umbraco.Cms.Infrastructure.Examine.IndexPopulator.Populate(IIndex[] indexes)
   at Umbraco.Cms.Infrastructure.Examine.ExamineIndexRebuilder.RebuildIndexes(Boolean onlyEmptyIndexes, TimeSpan delay, CancellationToken cancellationToken)
[10:20:02 ERR] Error occurred executing workItem.
System.InvalidOperationException: The Scope f3b51e5b-c1c9-418e-979f-a4856b7733f0 being disposed is not the Ambient Scope 01a42a5d-b7aa-4084-a4b5-2d930a76ff2e. This typically indicates that a child Scope was not disposed, or flowed to a child thread that was not awaited, or concurrent threads are accessing the same Scope (Ambient context) which is not supported. If using Task.Run (or similar) as a fire and forget tasks or to run threads in parallel you must suppress execution context flow with ExecutionContext.SuppressFlow() and ExecutionContext.RestoreFlow().
   at Umbraco.Cms.Infrastructure.Scoping.Scope.Dispose()
   at Umbraco.Cms.Infrastructure.Examine.ExamineUmbracoIndexingHandler.DeferedReIndexForContent.<>c__DisplayClass6_0.<Execute>b__0(CancellationToken cancellationToken)
   at Umbraco.Cms.Infrastructure.HostedServices.QueuedHostedService.BackgroundProcessing(CancellationToken stoppingToken)
matteus-hemstrom-nltg commented 1 year ago

We are experiencing locking issues and suspect that this might be one of the causes. When the exception above is thrown, it appears to prevent the scope from being disposed properly. As a result, this leaves a read lock on -333. All following attempts to save and publish fails to acquire write lock for -333 (so this may be related to https://github.com/umbraco/Umbraco-CMS/issues/14195). When this happens, we need to stop the faulting Umbraco instance so that the lock is released.

Note our setup: one back office instance and multiple API/subscriber instances. The issue seems to occur through the following steps:

  1. Have back office running
  2. Start an API/subscriber instance
  3. On the subscriber instance: "Queuing background job for RebuildIndexes."
  4. Save & Publish documents (while RebuildIndexes occur) using the back office
  5. On the subscriber instance (sometimes): "The Scope X being disposed is not the Ambient Scope Y"
  6. Save & Publish fails on back office instance until subscriber is stopped

Do you think this might be the issue? And do you have any suggestions/workarounds? Is it possible to turn off Examine indexing on subscriber instances?

bergmania commented 11 months ago

Hi @matteus-hemstrom-nltg..

Thanks for reporting.. We are still searching for a potential fix, but we have a hard time reproducing.

Would you be able to test using the custom code, that basically just wraps the execution in ExecutionContext.SuppressFlow()) and ensures a ask.CompletedTask is returned.

The following code should replace the native ExamineIndexRebuilder with this custom one.

using Examine;
using Umbraco.Cms.Core;
using Umbraco.Cms.Core.Composing;
using Umbraco.Cms.Core.Runtime;
using Umbraco.Cms.Core.Services;
using Umbraco.Cms.Infrastructure.Examine;
using Umbraco.Cms.Infrastructure.HostedServices;

namespace WhatEver;

public class TestCodeComposer : IComposer
{
    public void Compose(IUmbracoBuilder builder)
    {
        builder.Services.AddUnique<IIndexRebuilder, MyExamineIndexRebuilder>();
        builder.Services.AddUnique<ExamineIndexRebuilder, MyExamineIndexRebuilder>();
    }
}

public class MyExamineIndexRebuilder : ExamineIndexRebuilder
{
    private readonly IBackgroundTaskQueue _backgroundTaskQueue;
    private readonly IExamineManager _examineManager;
    private readonly ILogger<ExamineIndexRebuilder> _logger;
    private readonly IMainDom _mainDom;
    private readonly IEnumerable<IIndexPopulator> _populators;
    private readonly object _rebuildLocker = new();
    private readonly IRuntimeState _runtimeState;

    /// <summary>
    ///     Initializes a new instance of the <see cref="ExamineIndexRebuilder" /> class.
    /// </summary>
    public MyExamineIndexRebuilder(
        IMainDom mainDom,
        IRuntimeState runtimeState,
        ILogger<ExamineIndexRebuilder> logger,
        IExamineManager examineManager,
        IEnumerable<IIndexPopulator> populators,
        IBackgroundTaskQueue backgroundTaskQueue)
        : base(mainDom, runtimeState, logger, examineManager, populators, backgroundTaskQueue)
    {
        _mainDom = mainDom;
        _runtimeState = runtimeState;
        _logger = logger;
        _examineManager = examineManager;
        _populators = populators;
        _backgroundTaskQueue = backgroundTaskQueue;
    }

    public bool CanRebuild(string indexName)
    {
        if (!_examineManager.TryGetIndex(indexName, out IIndex index))
        {
            throw new InvalidOperationException("No index found by name " + indexName);
        }

        return _populators.Any(x => x.IsRegistered(index));
    }

    public override void RebuildIndex(string indexName, TimeSpan? delay = null, bool useBackgroundThread = true)
    {
        if (delay == null)
        {
            delay = TimeSpan.Zero;
        }

        if (!CanRun())
        {
            return;
        }

        if (useBackgroundThread)
        {
            _logger.LogInformation("Starting async background thread for rebuilding index {indexName}.", indexName);

            _backgroundTaskQueue.QueueBackgroundWorkItem(
                cancellationToken =>
                {
                    // Do not flow AsyncLocal to the child thread
                    using (ExecutionContext.SuppressFlow())
                    {
                        Task.Run(() => RebuildIndex(indexName, delay.Value, cancellationToken));

                        // immediately return so the queue isn't waiting.
                        return Task.CompletedTask;
                    }
                });
        }
        else
        {
            RebuildIndex(indexName, delay.Value, CancellationToken.None);
        }
    }

    public override void RebuildIndexes(bool onlyEmptyIndexes, TimeSpan? delay = null, bool useBackgroundThread = true)
    {
        if (delay == null)
        {
            delay = TimeSpan.Zero;
        }

        if (!CanRun())
        {
            return;
        }

        if (useBackgroundThread)
        {
            _logger.LogDebug($"Queuing background job for {nameof(RebuildIndexes)}.");

            _backgroundTaskQueue.QueueBackgroundWorkItem(
                cancellationToken =>
                {
                    // Do not flow AsyncLocal to the child thread
                    using (ExecutionContext.SuppressFlow())
                    {
                        Task.Run(() => RebuildIndexes(onlyEmptyIndexes, delay.Value, cancellationToken));

                        // immediately return so the queue isn't waiting.
                        return Task.CompletedTask;
                    }
                });
        }
        else
        {
            RebuildIndexes(onlyEmptyIndexes, delay.Value, CancellationToken.None);
        }
    }

    private bool CanRun() => _mainDom.IsMainDom && _runtimeState.Level == RuntimeLevel.Run;

    private void RebuildIndex(string indexName, TimeSpan delay, CancellationToken cancellationToken)
    {
        if (delay > TimeSpan.Zero)
        {
            Thread.Sleep(delay);
        }

        try
        {
            if (!Monitor.TryEnter(_rebuildLocker))
            {
                _logger.LogWarning(
                    "Call was made to RebuildIndexes but the task runner for rebuilding is already running");
            }
            else
            {
                if (!_examineManager.TryGetIndex(indexName, out IIndex index))
                {
                    throw new InvalidOperationException($"No index found with name {indexName}");
                }

                index.CreateIndex(); // clear the index
                foreach (IIndexPopulator populator in _populators)
                {
                    if (cancellationToken.IsCancellationRequested)
                    {
                        return;
                    }

                    populator.Populate(index);
                }
            }
        }
        finally
        {
            if (Monitor.IsEntered(_rebuildLocker))
            {
                Monitor.Exit(_rebuildLocker);
            }
        }
    }

    private void RebuildIndexes(bool onlyEmptyIndexes, TimeSpan delay, CancellationToken cancellationToken)
    {
        if (delay > TimeSpan.Zero)
        {
            Thread.Sleep(delay);
        }

        try
        {
            if (!Monitor.TryEnter(_rebuildLocker))
            {
                _logger.LogWarning(
                    $"Call was made to {nameof(RebuildIndexes)} but the task runner for rebuilding is already running");
            }
            else
            {
                // If an index exists but it has zero docs we'll consider it empty and rebuild
                IIndex[] indexes = (onlyEmptyIndexes
                    ? _examineManager.Indexes.Where(x =>
                        !x.IndexExists() || (x is IIndexStats stats && stats.GetDocumentCount() == 0))
                    : _examineManager.Indexes).ToArray();

                if (indexes.Length == 0)
                {
                    return;
                }

                foreach (IIndex index in indexes)
                {
                    index.CreateIndex(); // clear the index
                }

                // run each populator over the indexes
                foreach (IIndexPopulator populator in _populators)
                {
                    if (cancellationToken.IsCancellationRequested)
                    {
                        return;
                    }

                    try
                    {
                        populator.Populate(indexes);
                    }
                    catch (Exception e)
                    {
                        _logger.LogError(e, "Index populating failed for populator {Populator}", populator.GetType());
                    }
                }
            }
        }
        finally
        {
            if (Monitor.IsEntered(_rebuildLocker))
            {
                Monitor.Exit(_rebuildLocker);
            }
        }
    }
}
matteus-hemstrom-nltg commented 11 months ago

Thanks! We'll give the examine indexer above a try.

I tried a build earlier with something similar, however I just removed the Task.Run so that the indexing executed synchronously in the background thread. That worked well, but I was only running it in a test environment.

bergmania commented 11 months ago

Thanks.. At least that sounds like this area is the smoking gun 🙈

matteus-hemstrom-nltg commented 11 months ago

@bergmania

We are continuing to monitor the issue, but are very hopeful that the patch will solve it.

We've been running the patch in production for almost four days now. We have not had any permanent locks, so the patch seems very promising. However, I should reiterate what you pointed out earlier: this issue is hard to reproduce consistently. As such, it's possible that we've been lucky in not encountering the bug.

I should also mention that temporary lock failures (on -333) still occur. I believe these should be possible to mitigate by increasing DistributedLockingWriteLockDefaultTimeout, correct?

bergmania commented 11 months ago

Correct.. The fact that -333 happens is kind of expected doing long running operations. But definitely no locks should be hanging.

jerpenol commented 10 months ago

Any updates on this?