Closed sbosell closed 3 years ago
This error just occurred right now.
This happened almost daily this week and is being reported by others. Duplicate ticket: https://github.com/umbraco/Umbraco-CMS/issues/9523
Seen several cases of this, too. Could not nail down the cause - seems to happen totally randomly, occurred sometimes after a deployment, sometimes after a content transfer on Cloud, sometimes just after leaving the site be and coming back to it after a few days. This happens regardless of models builder mode (had it happen on sites using PureLive and AppData), and on plain/package-less Umbraco installations.
This continues to happen almost daily. Umbraco randomly unpublishes nodes and the content is 404 not found.
FYI, I've seen this with an Umbraco 8.6.2 install, so it has been happening at least since then.
@Nicholas-Westby Thanks. I updated the ticket description to include 8.6.2.
In the related issues someone noted that they could fix the broken nodes by rebuilding the cache. I wonder if disabling the cache DB (thus forcing it to rebuild upon startup) is a workaround for this?
If anyone should be inclined to test this, try adding the following to an IUserComposer
:
public void Compose(Composition composition)
{
composition.Register(factory => new PublishedSnapshotServiceOptions
{
IgnoreLocalDb = true
});
// ...
}
@kjac what related issue are you referring to?
I can test the change this week.
I deployed the change this morning and the documentation for it is here. If the only consequence is a slower startup time on a new server I can live with that if it addresses this problem. I should point out we do not have replica servers.
The change made no difference as we experienced the same issue this afternoon.
Do you have multiple root nodes? And what domains do you have assigned to them? If you have multiple root nodes, only one of them can have no domains, all other ones must have domains.
I have the simplest setup of one root and one domain and no language variants.
I'm seeing this problem on Umbraco 8.6.1. There's only one root node and only one domain, but there are multiple language variants. On several pages, only one of the langauges is routable, despite all of them being published.
It would be helpful if we can determine if it's caused indirectly by some action in the backoffice, or by some background task or something similar.
Do you know if the issue occurs at any time of the day, does it happen in the middle of the night when no one is using the back office as well?
Would it be possible for you to check the umbracoAudit table in the database as well? Maybe something happens leading up to the bug that might be able to help us.
@nikolajlauridsen
The issue for us happens randomly and there are very few updates in the backoffice (maybe one time per week at the most). We do have a process that runs every 30 minutes that's outlined in the ticket. Here are some of the times of day it has occurred:
Here is the full audit log with the user info removed (don't want to get spammed).
Could this issue please get some attention - we do not dare to upgrade to the next version from 8.5.5. We are kinda stuck - there are issues in 8.5.5 that are fixed in the next versions, but if we do - we risk new and more serious issues :sigh:
We are continuously looking at this issue. Sadly we can't reproduce and we are therefore looking for a needle in a haystack.
@vaags and @sbosell, when you say "one domain" are you talking about domains/hostnames in Umbraco or did you not add any domains in "Culture and Hostnames"?
@bergmania in my case we have english as the only language and we actually have domain.com And www.domain.com setup on that screen. I'm happy to share the backend info privately with someone from the Umbraco team.
We also see nodes randomly losing the url ("This document is published but its url cannot be routed").
In our case we see it already for several months on two separate Umbraco Cloud projects with always the latest version (so the current latest version has the issue, but also several earlier versions). Unfortunately I cannot tell since which version exactly this started to occur.
It only seems to happen after a (code) deployment from a lower environment. I haven't seen this occur randomly without a deployment.
Both projects are setup with multiple subsites at the top level, each with their own domain configured in "Culture and Hostnames". For both projects multiple languages are used but in a v7 way (eg not making use of v8 variants). One of the projects is a baseline project (where we see this behavior on all child projects) and the other is a plain/single project with two environments. In one project we have jobs running that create content using the content service (similar to some other cases mentioned here). But in the other project we don't create content programmatically.
Rebuilding the cache fixes the issue for us. So unfortunately this is now part of our workflow after each deployment.
Shared info about this with @Jay-umbr via Umbraco Cloud support previously. If needed I can share extra information to find the root cause of this.
@orsinic I can fix the issue by rebuilding the cache or by publishing the node that was derouted. It happens randomly for us (self hosted azure). Derouting shouldn't even be possible.
Good to know there is a work around if the issue occurs - but I don't think I will upgrade just yet if a part of the deployment rutine is to rebuild cache of republish nodes after each deployment.
For those of you, with only a single hostname assigned, could you try to remove the hostname and see if the error goes away. There's not really a reason to have just one hostname with one root node. If that 'fixes' the issue, then we know its a domain issue. That would help a lot.
I was directed here by Umbraco support after encountering an issue with content suddenly disappearing from our production site the second time this month. The two times we had issues had similar circumstances and resolutions, but presented as different front-end problems, so I'll list them separately:
Earlier instance:
Latest instance:
Info about the site:
Errors I am seeing in the logs:
System.IO.FileNotFoundException: Could not load file or assembly 'Umbraco.Cloud.StorageProviders.AzureBlob' or one of its dependencies. The system cannot find the file specified.
System.Data.SqlClient.SqlException (0x80131904): Lock request time out period exceeded.
System.InvalidOperationException: Sequence contains no elements
and System.NullReferenceException: Object reference not set to an instance of an object.
Hi @Zweben.. Thansk for the detailed description. Do you have the stacktrace from the following exception System.Data.SqlClient.SqlException (0x80131904): Lock request time out period exceeded.
When anyone is seeing this issue next time. Please go the content in backoffice and post the response of this call here: {domain}/umbraco/backoffice/UmbracoApi/Content/GetById?id={contentId}
. I don't understand how the link from @sbosell can be shown at the same time as the message..
@bergmania
The api call you asked for is attached to the ticket. I anonymized three entries in the file. Owner names and the domain was changed to mydomain.com instead of the actual domain. Also attached is a screenshot of the backend and the GetById call after publishing/fixing the node.
Hi @Zweben.. Thansk for the detailed description. Do you have the stacktrace from the following exception
System.Data.SqlClient.SqlException (0x80131904): Lock request time out period exceeded.
I do not. This has only happened on our Live site recently, where debug mode is off. Is there a way I can get a stack trace without debug mode / detailed error pages enabled?
I do not. This has only happened on our Live site recently, where debug mode is off. Is there a way I can get a stack trace without debug mode / detailed error pages enabled?
The stack trace should still be part of the log entry in the log file..
I do not. This has only happened on our Live site recently, where debug mode is off. Is there a way I can get a stack trace without debug mode / detailed error pages enabled?
The stack trace should still be part of the log entry in the log file..
Got it. There are 3 instances of this near the time of the issue, and one of the traces is a little different, so here are the two versions:
Thanks @Zweben.. Both of these seems to happen from an user interaction in backoffice, where a content is published. Sadly we don't have info in the stacktrace, that could tell if it was the same page that later becomes unavailable.
I think we just experienced this problem i 8.5.5 - we busted the caches and it worked again. We have two root nodes.
I think we just experienced this problem i 8.5.5 - we busted the caches and it worked again. We have two root nodes.
Interesting, it looks like it's a problem that has been here super long. Unfortunately, it just does not make it easier to find the root cause.. We are still searching for the issue, without being able to reproduce.
Honestly I dont think its a problem related to multiple root nodes - but not sure :)
Honestly I dont think its a problem related to multiple root nodes - but not sure :)
That was just one question to narrow down the issue
@sbosell re: the above question from @bergmania
For those of you, with only a single hostname assigned, could you try to remove the hostname and see if the error goes away. There's not really a reason to have just one hostname with one root node. If that 'fixes' the issue, then we know its a domain issue. That would help a lot.
Are you able to try without having a hostname assigned to your single root node? We want to see if this has anything to do with domain caches.
@sbosell you mentioned in #9523 that this happens every day for you which is interesting since most others are very sporadic/random. As we need to try to reproduce but we can't, seems like you have are able to best assist. For more logging output that may help, you can add this line to your serilog.config file:
<add key="serilog:minimum-level:override:Umbraco.Web.Routing.PublishedRouter" value="Debug" />
This will log A LOT of info but if this happens everyday for you, perhaps you can enable this for a day until it happens and share the log outputs?
@Shazwazza I enabled the serilog setting. As long as it doesn't negatively impact the site it isn't a problem.
@Shazwazza Is there a private way I can share the file with the team?
@Shazwazza Is there a private way I can share the file with the team?
@sbosell, feel free to send a mail to me, then I will distribute it internally to those to help on this task. bmb@umbraco.dk
@sbosell quick question, how do you get a list of all nodes having the "This document is published but its url cannot be routed" issue?
As noted before (and as far as I know) we do not randomly get this, but only after code deployments. Being able to have a report with the impacted nodes might help digging a bit deeper or trying to get this to be reproduced.
Thanks in advance!
@orsinic I have an uptime monitor on every page on the site but we only see the issue on one page so I immediately know when it is down. You can see that in the graphic posted above in my last message and we get notified via a Clickup Ticket (teams/slack as well) and an email.
@bergmania Sent you the log file.
@sbosell thanks for enabling and sending those logs. here you've mentioned you use a custom UmbracoVirtualNodeRouteHandler. Based on your logs I'm guessing that the URL that stops working is based on this custom UmbracoVirtualNodeRouteHandler? If you can confirm - then this is the problem area for your specific issue and will help with investigation.
@Zweben here you mention you use a custom IContentFinder. Do you also have a custom url provider? Is it possible to disable these to see if the error goes away? You also mention you use HangFire - what and when exactly is this doing? You've also mentioned Uses multiple root nodes, one with hostnames assigned, three without
... in theory there should only ever be one root node without a domain else there can be problems with inbound routing a naming conflicts which can end up resulting in routing errors such as these. Can you please ensure there is only one root node without a domain (else you can have all root nodes with domains too) and see if that resolves the issue?
There are a lot of varying setups, questions and issues on this one thread and I don't really think they are all caused by the exact same thing. There's also a few suggestions above for things to try that we haven't heard back about yet. To recap:
@Shazwazza - The node that is being derouted does not have a custom UmbracoVirtualNodeRouteHandler, but some of its children do. For instance the route in my case that is derouted is for the /locations node and all the custom route handlers are for /locations/{id}, etc but not ever for /locations.
We're getting the same issue as this with many pages on a site we look after. The site is running 8.9.1. We noticed this a couple of months ago and put it down to the cache need rebuilt and/or page published.
We also thought it was a conflict between controller names with doc types (even though they were surface controllers) so we changed the name of those classes but it didn't help.
This is now happening several times a day and causing big issues for our client. The site is hosted on a virtual server (not azure or cloud based) and is not load balanced.
Any update would be much appreciated.
Same here - we're on 8.6.6. Originally on 8.6.1 we assumed we just had an issue with the indexes (fixed in either 8.6.2 or 8.6.3), but that was just masking this problem. As per benbrace, this is a real issue for our client as the area of the site that's disappearing is the section where people have paid good money to access the content as members.
It happens sporadically - sometimes multiple times daily, then nothing for a week etc. There's no real discernible pattern - it happens when the site has users and when there's no traffic. With and without content editors changing / uploading content etc.
It's a single root node site, single language with no exotic elements at all. We did try and think laterally to temporarily work out the issue, by installing hangfire, detecting the page going down and automating an http post to the /BackOffice/Api/NuCacheStatus/ReloadCache endpoint, but to no avail. The call returns http 200, but i suspect the security context is missing to authenticate the hangfire user as admin etc. (This issue was happening well before hangfire was installed btw - we only installed it as a last gasp attempt)
If you need any information (nucache dumps, access to the site etc) please don't hesitate to reach out - the sooner we get this sorted the better for the whole community!
Many thanks
@Zweben here you mention you use a custom IContentFinder. Do you also have a custom url provider? Is it possible to disable these to see if the error goes away? You also mention you use HangFire - what and when exactly is this doing? You've also mentioned
Uses multiple root nodes, one with hostnames assigned, three without
... in theory there should only ever be one root node without a domain else there can be problems with inbound routing a naming conflicts which can end up resulting in routing errors such as these. Can you please ensure there is only one root node without a domain (else you can have all root nodes with domains too) and see if that resolves the issue?
@Shazwazza: I don't remember exactly, but I think my IContentFinder doesn't have a custom URL provider that corresponds to it (it's a bit of an unusual case), but I believe I do have a custom URL provider used elsewhere. Unfortunately, I'm not able to disable either as they're both being used in production.
Hangfire is running a few tasks: importing JSON records into nodes, and sending out a few email reports. I've checked them fairly recently and they weren't hitting any errors and didn't appear to be causing any issues.
Regarding the multiple root nodes, I wasn't aware of that... there should probably be some warning in the CMS, as it's quite easy to configure things this way. I should be able to move things around so that there is only one root node without a domain assigned. Is it safe to have a second set of domainless nodes in the root if that second set is purely "settings" nodes that have no template assigned and are therefore not routed to directly?
Does anyone know if you can inject an IPolicyCache into an UmbracoApiController like the following and will clearing it cause the nucache to rebuild?
private readonly IAppPolicyCache _runtimeCache;
public ResyncController(IAppPolicyCache cache)
{
_runtimeCache = cache;
// this would be called in anohter method just putting here for this post
_runtimeCache.Clear();
}
Does anyone know if you can inject an IPolicyCache into an UmbracoApiController like the following and will clearing it cause the nucache to rebuild?
No, that is not how nucache works
Regarding the multiple root nodes, I wasn't aware of that... there should probably be some warning in the CMS, as it's quite easy to configure things this way. I should be able to move things around so that there is only one root node without a domain assigned. Is it safe to have a second set of domainless nodes in the root if that second set is purely "settings" nodes that have no template assigned and are therefore not routed to directly?
The reason why multiple root nodes without domains is not supported is because you can easily end up with ambiguous route URLs based on the names of nodes. This isn't guaranteed but it can happen (have seen it often) and there is no guarantee that it will pick the 'first' ambiguous URL over the 2nd or 3rd. This has previously resulted in nodes going 'missing' until that URL cache was cleared. You can always assign dummy domains to settings root nodes, etc... This also depends on if your settings nodes have templates and/or can be routed to, what their names are , etc... My point is, that it's impossible to understand everyone's particular website so the general rule is - do not have more than one root node without a domain = safe.
@MartinThomasCW this point is also interesting:
As per benbrace, this is a real issue for our client as the area of the site that's disappearing is the section where people have paid good money to access the content as members.
It seems peculiar that only this section does this. I know this thread is super long but as I've mentioned above I suspect that there can be multiple different issues here because each person's site is different. Is there something special about this section? As above, do you have custom IContentFinder, UrlProviders, UmbracoVirtualNodeRouteHandler, etc... ?
Sometimes a node in umbraco will be in a state of published but not routable when this should not be allowed to occur. Umbraco in this case throws a 404 for that route.
There are several cases of this being reported as well as other issues reported here (duplicate)
Related to issue #7575
Umbraco version
I am seeing this issue on Umbraco version: 8.6.2, 8.7.x, and 8.9.1
Reproduction
I have a process that syncs data in the backend every 30 minutes to a node's descendants and then the content api calls SaveAndPublish and/or SaveAndPublishBranch (tested with both in the sync). The sync uses the content api via a recurring task, all standard features of Umbraco. In my case we have no cultures defined and only one root site. The most basic Umbraco setup. Others are reporting the same issue so it may have nothing to do with the Content Api.
About 1 time per day the root node of the sync will be in this state of the url can't be routed and it is random in that it only occurs every once in a while which leads me to believe there is a locking mechanism with how the cache works. This should never happen, there should be no code path that unpublished or deroutes a node like this in the save and publish.
Hosted on Azure in a single site (not load balanced) with all the appropriate settings set via the Umbraco Documentation.
Bug summary
A node becomes not routable when this should never happen.
Specifics
I can privately provide a URL
Steps to reproduce
It happens randomly
Expected result
A valid published node in Umbraco should NEVER be in a state of the URL not being routable.
Actual result
Umbraco backend reports the URL not being routable and the node is effectively unpublished.
_This item has been added to our backlog AB#9840_