umbraco / Umbraco-CMS

Umbraco is a free and open source .NET content management system helping you deliver delightful digital experiences.
https://umbraco.com
MIT License
4.36k stars 2.64k forks source link

Old .cfs files are not deleted after Examine index rebuild. #15994

Open EssigAttila opened 3 months ago

EssigAttila commented 3 months ago

Which Umbraco version are you using? (Please write the exact version, example: 10.1.0)

8.18.5

Bug summary

A running Umbraco 8.18.5 project with custom Examine indexes experiences slowdown after 400-500 index rebuilds, eventually leading to a complete halt without response. The index contains approximately 2000 items. We've tried various solutions, but only restarting the IIS app pool manually every 2-3 weeks resolves the issue.

During monitoring, we noticed that the directory associated with the index was accumulating .cfs files. After each rebuild, a new one was created, and the system was not deleting the old ones.

Since we didn't find any errors related to this in the logs, we ran a deeper monitoring using the Debug Diagnostic Tool (DebugDiag).

Based on the results, we see that Examine tries to delete the old .cfs files, but it fails because they are in use. We examined which process holds the files and found out it's the same process on which Umbraco is running.

image

Since we couldn't determine what we improperly implemented in our own code, we examined the behavior of the indexes in a clean, freshly installed Umbraco 8.18.5 environment. Surprisingly, we found that the issue persisted even in this clean Umbraco installation.

Specifics

DebugDiag Results

Hundreds of similar exceptions:

Exception Type: System.IO.IOException Count: 49 Message: The process cannot access the file 'C:\inetpub\wwwroot\MyUmbracoSite\App_Data\TEMP\ExamineIndexes\ArticleIndex_Scope\_cry.cfs' because it is being used by another process. | System.IO.__Error.WinIOError(Int32, System.String)   System.IO.FileInfo.Delete()   Lucene.Net.Store.FSDirectory.DeleteFile(System.String)

.Net Call Stack:

[[HelperMethodFrame]] mscorlib_ni!System.IO.__Error.WinIOError(Int32, System.String)+37d mscorlib_ni!System.IO.FileInfo.Delete()+dc3a0c Lucene.Net.Store.FSDirectory.DeleteFile(System.String)+5f Lucene.Net.Index.IndexFileDeleter.DeleteFile(System.String)+67 Lucene.Net.Index.IndexFileDeleter.DeletePendingFiles()+99 Lucene.Net.Index.IndexFileDeleter.Checkpoint(Lucene.Net.Index.SegmentInfos, Boolean)+11a Lucene.Net.Index.IndexWriter.DeleteAll()+a5 Examine.LuceneEngine.Providers.LuceneIndex.EnsureIndex(Boolean)+1a6 Umbraco.Examine.IndexRebuilder.RebuildIndex(System.String)+51 BackfieldDigital.Services.IndexRebuildManager+d__7.MoveNext()+923 mscorlib_ni!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)+163 mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)+14 mscorlib_ni!System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner.Run()+6c mscorlib_ni!System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Action, Boolean, System.Threading.Tasks.Task ByRef)+62 mscorlib_ni!System.Threading.Tasks.Task.FinishContinuations()+3d7 mscorlib_ni!System.Threading.Tasks.Task1[[System.Threading.Tasks.VoidTaskResult, mscorlib]].TrySetResult(System.Threading.Tasks.VoidTaskResult)+63 mscorlib_ni!System.Threading.Tasks.Task+DelayPromise.Complete()+a5 mscorlib_ni!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)+163 mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)+14 mscorlib_ni!System.Threading.TimerQueueTimer.CallCallback()+ba mscorlib_ni!System.Threading.TimerQueueTimer.Fire()+87 mscorlib_ni!System.Threading.TimerQueue.FireNextTimers()+75 [[DebuggerU2MCatchHandlerFrame]] [[ContextTransitionFrame]] [[DebuggerU2MCatchHandlerFrame]]

Steps to reproduce

In a clean Umbraco project, we created an index containing all the basic fields of every node. We created an endpoint that triggers the index rebuild. In a Postman collection, we made a GET request to rebuild the index, and we ran this 1000 times with 10-second intervals between each call. While these 1000 calls were running, we waited for 5 minutes for all built-in operations after startup to complete in Umbraco. Then, in Umbraco, we changed the name of a node and published it. After this, with each rebuild, a new .cfs file was generated, and the old .cfs files were not deleted as long as Umbraco was running.

Index related files in clean umbraco 8.18.5:

IndexComposer.cs:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using Umbraco.Core.Composing;
using Umbraco.Core;

namespace UmbracoIndexSample.Components.Index
{
    [RuntimeLevel(MinLevel = RuntimeLevel.Run)]
    public class IndexComposer : IUserComposer
    {
        public void Compose(Composition composition)
        {
            composition.RegisterUnique<IndexCreator>();
            composition.Components().Append<IndexComponent>();
            composition.Register<SampleIndexPopulator>(Lifetime.Singleton);
        }
    }
}

IndexComponent.cs:


using Examine.LuceneEngine.Directories;
using Examine;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using Umbraco.Core.Composing;
using Umbraco.Core.Logging;
using Umbraco.Examine;

namespace UmbracoIndexSample.Components.Index
{
    public class IndexComponent : IComponent
    {
        private const string LogPrefix = "UmbracoIndexSample";

        private readonly IExamineManager _examineManager;
        private readonly IndexCreator _indexCreator;
        private readonly ILogger _logger;

        public IndexComponent(IExamineManager examineManager, IndexCreator indexCreator, ILogger logger)
        {
            _examineManager = examineManager;
            _indexCreator = indexCreator;
            _logger = logger;
        }

        public void Initialize()
        {
            try
            {
                DirectoryFactory.DefaultLockFactory = d =>
                {
                    var simpleFsLockFactory = new NoPrefixSimpleFsLockFactory(d);
                    return simpleFsLockFactory;
                };

                foreach (var index in _indexCreator.Create())
                {
                    _examineManager.AddIndex(index);
                }
            }
            catch (Exception e)
            {
                _logger.Error(this.GetType(), $"{LogPrefix} {e.Message}", e);
            }
        }

        public void Terminate()
        {
            _logger.Info(this.GetType(), $"{LogPrefix} Terminating: {nameof(IndexComponent)}");
        }
    }
}

IndexCreator.cs:

using Examine.LuceneEngine.Providers;
using Examine;
using Lucene.Net.Analysis.Standard;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using Umbraco.Core.Services;
using Umbraco.Examine;

namespace UmbracoIndexSample.Components.Index
{
    public class IndexCreator : LuceneIndexCreator
    {
        private readonly IContentService _contentService;

        public IndexCreator(IContentService contentService)
        {
            _contentService = contentService;
        }

        public override IEnumerable<IIndex> Create()
        {
            var newIndexes = new List<IIndex>();
            var rootItems = _contentService.GetRootContent();
            var fieldDefinitions = GetFieldDefinitionCollection();

            foreach (var rootItem in rootItems)
            {
                var indexName = IndexNameHelper.GetIndexName(rootItem.Name);
                var index = CreateIndex(fieldDefinitions, indexName);
                newIndexes.Add(index);
            }
            return newIndexes;
        }

        private LuceneIndex CreateIndex(FieldDefinitionCollection fieldDefinitions, string indexName)
        {
            return new LuceneIndex(indexName,
                base.CreateFileSystemLuceneDirectory(indexName),
                fieldDefinitions,
                new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30)
                );
        }

        private static FieldDefinitionCollection GetFieldDefinitionCollection()
        {
            return new FieldDefinitionCollection(
                new FieldDefinition("id", FieldDefinitionTypes.FullText),
                 new FieldDefinition("url", FieldDefinitionTypes.FullText),
                 new FieldDefinition("nodeName", FieldDefinitionTypes.FullText),
                 new FieldDefinition("createdDate", FieldDefinitionTypes.DateTime),
                 new FieldDefinition("publishedDate", FieldDefinitionTypes.DateTime)
                );
        }
    }
}

IndexNameHelper.cs:

namespace UmbracoIndexSample.Components.Index
{
    public static class IndexNameHelper
    {

        public static string GetIndexName(string rawSiteName)
        {
            return !string.IsNullOrEmpty(rawSiteName) ? $"Index_{rawSiteName?.Replace(" ", "")}" : null;
        }

        public static string GetSecondaryIndexName(string indexName) => !string.IsNullOrEmpty(indexName) ? $"{indexName}_sec" : null;
    }
}

SampleIndexPopulator.cs:


using Examine;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using Umbraco.Core.Logging;
using Umbraco.Core.Models.PublishedContent;
using Umbraco.Core.Services;
using Umbraco.Examine;
using Umbraco.Web;

namespace UmbracoIndexSample.Components.Index
{
    public class SampleIndexPopulator : IndexPopulator
    {
        private const string LogPrefix = "UmbracoIndexSample";

        private readonly IUmbracoContextFactory _umbracoContextFactory;
        private readonly IContentService _contentService;
        private readonly ILogger _logger;

        public SampleIndexPopulator(IUmbracoContextFactory umbracoContextFactory,
            IContentService contentService,
            ILogger logger)
        {
            _umbracoContextFactory = umbracoContextFactory;
            _contentService = contentService;
            _logger = logger;

            var rootItems = _contentService.GetRootContent();

            foreach (var rootItem in rootItems)
            {
                var indexName = IndexNameHelper.GetIndexName(rootItem.Name);
                RegisterIndex(indexName);
            }
        }

        protected override void PopulateIndexes(IReadOnlyList<IIndex> indexes)
        {
            var rootItems = _contentService.GetRootContent().Where(x => x.Published);

            foreach (var rootItem in rootItems)
            {
                var indexName = IndexNameHelper.GetIndexName(rootItem.Name);
                var index = indexes.SingleOrDefault(x => x.Name == indexName) ?? indexes.SingleOrDefault(x => x.Name == IndexNameHelper.GetSecondaryIndexName(indexName));

                if (index == null)
                {
                    continue;
                }

                _logger.Info(this.GetType(), $"{LogPrefix} Populate Index:{index.Name}");

                var publishedRoot = _umbracoContextFactory.EnsureUmbracoContext().UmbracoContext.Content.GetById(rootItem.Id);

                var items = publishedRoot?.Descendants()?.Where(x => x.IsPublished());

                _logger.Info(this.GetType(), $"{LogPrefix} Populating ArticleIndex:{index.Name}, Article count:{items?.Count() ?? 0}");
                if (items?.Any() ?? false)
                {
                    List<ValueSet> valueSets = GetValueSetList(items);
                    index.IndexItems(valueSets);
                }

                _logger.Info(this.GetType(), $"Populating finished. index:{index.Name}");
            }
        }

        private static List<ValueSet> GetValueSetList(IEnumerable<IPublishedContent> items)
        {
            List<ValueSet> valueSets = new List<ValueSet>();

            foreach (var item in items)
            {
                var indexValues = new Dictionary<string, object>
                {
                    ["id"] = WrapMultiselectValue(item.Id),
                    ["nodeName"] = item.Name,
                    ["createdDate"] = item.CreateDate,
                    ["publishedDate"] = item.UpdateDate
                };

                var valueSet = new ValueSet(item.Id.ToString(), "Global", indexValues);

                valueSets.Add(valueSet);
            }
            return valueSets;
        }

        private static string WrapMultiselectValue<T>(T value)
        {
            return $"ß{value}ß";
        }
    }
}

IndexApiController.cs

using Examine;
using System.Web.Http;
using Umbraco.Examine;
using Umbraco.Web.Search;
using Umbraco.Web.WebApi;

namespace UmbracoIndexSample.Controllers
{
    public class IndexApiController : UmbracoApiController
    {
        private const string IndexName = "Index_Home";

        private readonly IExamineManager _examineManager;
        private readonly IndexRebuilder _indexRebuilder;
        private readonly BackgroundIndexRebuilder _backgroundIndexRebuilder;

        public IndexApiController(IExamineManager examineManager, IndexRebuilder indexRebuilder, BackgroundIndexRebuilder backgroundIndexRebuilder)
        {
            _examineManager = examineManager;
            _indexRebuilder = indexRebuilder;
            _backgroundIndexRebuilder = backgroundIndexRebuilder;
        }

        [HttpGet]
        public IHttpActionResult RebuildIndex()
        {
            if (!_examineManager.TryGetIndex(IndexName, out IIndex index))
            {
                return BadRequest($"{IndexName} index not found");
            }

            _indexRebuilder.RebuildIndex(IndexName);
            return Ok();
        }

        [HttpGet]
        public IHttpActionResult FullRebuildIndex()
        {
            _backgroundIndexRebuilder.RebuildIndexes(false, 100);
            return Ok();
        }
    }
}

Environment settings:

Our project to reproduce:

UmbracoIndexSample.zip

Expected result / actual result

Expected result:

The old .cfs files are deleted over time, and the system does not slow down due to the accumulation of .cfs files.

github-actions[bot] commented 3 months ago

Hi there @EssigAttila!

Firstly, a big thank you for raising this issue. Every piece of feedback we receive helps us to make Umbraco better.

We really appreciate your patience while we wait for our team to have a look at this but we wanted to let you know that we see this and share with you the plan for what comes next.

We wish we could work with everyone directly and assess your issue immediately but we're in the fortunate position of having lots of contributions to work with and only a few humans who are able to do it. We are making progress though and in the meantime, we will keep you in the loop and let you know when we have any questions.

Thanks, from your friendly Umbraco GitHub bot :robot: :slightly_smiling_face: