umbraco / Umbraco-CMS

Umbraco is a free and open source .NET content management system helping you deliver delightful digital experiences.
https://umbraco.com
MIT License
4.4k stars 2.66k forks source link

Critical Performance Issues Post-Upgrade from Umbraco 11.5.0 to 13.4.1 #16803

Closed piotrbach closed 1 month ago

piotrbach commented 1 month ago

Which Umbraco version are you using? (Please write the exact version, example: 10.1.0)

13.4.1

Bug summary

After upgrading Umbraco from version 11.5.0 to 13.4.1, we are experiencing critical issues with basic operations such as publishing and unpublishing content. Additionally, we are often encountering database lock errors. We have noticed significant changes in the database schema and indexes.

Please let us know if this upgrade/change has been performance-tested on a large volume of data similar to ours. We would greatly appreciate any guidance or solutions to resolve this critical issue.

cc: @wtct

Specifics

We are managing a large dataset with ~325 000 documents.

We have attempted various solutions to resolve the issue, including:

Additionally, we observed that the upgrade process takes so long that the https://localhost:44392/install view does not redirect after completion. We can only see from the logs that the process has finished. When accessing https://localhost:44392/ directly and removing /install, the site still shows 'Website is Under Maintenance'. Only after restarting Umbraco does the site function normally.

Steps to reproduce

  1. Start with an Umbraco 11.5.0 instance containing 325K documents.

  2. Perform an upgrade to Umbraco 13.4.1.

  3. Attempt to publish or unpublish a document.

  4. Observe the performance and any errors related to database locks.

Expected result / actual result

umbraco-lock-issue

github-actions[bot] commented 1 month ago

Hi there @piotrbach!

Firstly, a big thank you for raising this issue. Every piece of feedback we receive helps us to make Umbraco better.

We really appreciate your patience while we wait for our team to have a look at this but we wanted to let you know that we see this and share with you the plan for what comes next.

We wish we could work with everyone directly and assess your issue immediately but we're in the fortunate position of having lots of contributions to work with and only a few humans who are able to do it. We are making progress though and in the meantime, we will keep you in the loop and let you know when we have any questions.

Thanks, from your friendly Umbraco GitHub bot :robot: :slightly_smiling_face:

pvhees commented 1 month ago

I encountered the same issue (upgrading from 11.5 to 13.4.0). The 'DeleteVersions' job is blocking everything. However once it's done, the issue is resolved and doesn't come back. To help with the database timeouts and locking issues:

Increase the connect timeout of the connectionstring to a big number Add the following in In appsettings.json under Umbraco -> global:

    "DistributedLockingReadLockDefaultTimeout": "00:05:00",
    "DistributedLockingWriteLockDefaultTimeout": "00:00:20"

Then I let it run for a while.

wtct commented 1 month ago

Hello Guys,

Just letting you know, we have just found a critical performance bottleneck which is caused by refactoring in the #14806.

We are going to prepare a new pull request.

If you started investigation please hold on :)

cc: @nul800sebastiaan, @Zeegaan, @elit0451, @nikolajlauridsen, @leekelleher, @mattbrailsford

piotrbach commented 1 month ago

Hi @pvhees and big thanks for your suggestion🤝.

Yes, we initially disabled all background services, including 'DeletedVersions,' plus played around with timeouts, but this did not resolve the issue. We found the problem and PR is waiting for merge https://github.com/umbraco/Umbraco-CMS/pull/16837. It's worth noting that we are managing a dataset of 325,000 documents. I suspect that with fewer than 100,000 documents, the problem might not be noticeable.

@wtct

Zeegaan commented 1 month ago

I am not sure why this is a problem at all ? 🤔 The PR just changed the default behavior. But it might be a documentation issue 🤔 To revert to the old behavior, you have to change the UsePagedSqlQuery option to false in your appSettings like so:

 "Umbraco": {
    "CMS": {
      "NuCache": {
        "UsePagedSqlQuery" : false      },

Let me know if that helps 😁

PerplexDaniel commented 1 month ago

@Zeegaan Looking at PR #16837 it seems your change from earlier (in #14806) removed a WHERE clause from the COUNT query in some cases. It seems you refactored 3 different ones into 1 without a WHERE, but 2 of the original 3 had a WHERE clause.

For example, in #14806 this was changed:

Original (using a .WhereIn): image

Your change (without .WhereIn). This version is used for all 3 cases: image

Likewise for this one, it also uses the version without any WHERE clause. image

wtct commented 1 month ago

I am not sure why this is a problem at all ?

Hi @Zeegaan,

It seems @PerplexDaniel clarified this case even more, but you can still take a look at the critical line you forget during your refactoring:

https://github.com/umbraco/Umbraco-CMS/pull/14806/files#diff-1c5f9167e84fb37137bd239d3fddb0aebe0ac78b01e3b8c51871b227791d8ce0L349

 "Umbraco": {
    "CMS": {
      "NuCache": {
        "UsePagedSqlQuery" : false      },

Let me know if that helps 😁

This is probably only a workaround, but we don't want to go this way, because it could cause another issues.

Honestly, the issue is caused by an obvious bug introduced by refactoring performed in #14806

Therefore, we created a new PR #16837 which keeps logic of your PR #14806, but we restored WHERE clauses as needed.

I recommend debugging the QueryPaged extension method during publishing a node which contains some children:

https://github.com/umbraco/Umbraco-CMS/blob/contrib/src/Umbraco.Infrastructure/Persistence/NPocoDatabaseExtensions.cs#L35

BTW, I have to mention that I'm a little bit shocked that the performance of basic CRUD operations are not tested enough before releasing a new version, because everything will work as expected on database with a small amount of nodes even there is a critical performance issue like this.

Zeegaan commented 1 month ago

@PerplexDaniel Thank you for the clarification, that made it really clear 😁 I will take a look at merging the PR ASAP 😄

piotrbach commented 1 month ago

Hi, @PerplexDaniel big thanks for your input and extra clarification🤝.

@Zeegaan I can suggest you to:

  1. Set first breakpoint in ContentService https://github.com/umbraco/Umbraco-CMS/blob/contrib/src/Umbraco.Core/Services/ContentService.cs#L1387
  2. Set second breakpoint in https://github.com/umbraco/Umbraco-CMS/blob/contrib/src/Umbraco.Infrastructure/Persistence/NPocoDatabaseExtensions.cs#L63 while loop
  3. Publish a node with debugging on. Node without children should be fine I guess.
  4. Debug through cache refreshing code.
  5. Inspect itemCount in public static IEnumerable QueryPaged method, you will notice total number of Umbraco documents (no filtering, WHERE).

image

As a result the while loop takes forever and it's killing the system.

Simply merging PR https://github.com/umbraco/Umbraco-CMS/pull/16837 is a quick solution here.

Zeegaan commented 1 month ago

@piotrbach thanks for the debugging steps, it helped during testing 🚀

Fixed in https://github.com/umbraco/Umbraco-CMS/pull/16837