Closed JefSchraag closed 6 years ago
I just tried to reproduce the problem in another application instance. I expected this sequence of commands to result in the same exception.
First
New-ServiceFabricApplication -ApplicationName fabric:/IbeMondayBlueAlt -ApplicationTypeName IbeType -ApplicationTypeVersion 1840.0 -ApplicationParameter $params
Then
Start-ServiceFabricApplicationUpgrade -ApplicationName fabric:/IbeMondayBlueAlt -ApplicationTypeVersion 1841.0 -UnmonitoredManual -ApplicationParameter $params
...
TargetApplicationTypeVersion : 1841.0
UpgradeKind : Rolling
ForceRestart : False
In this case, the upgrade started as expected. Leaving me confused.
Hello @JefSchraag, Just to make sure, You issued the following two cmdlets and got the exception: (1) New-ServiceFabricApplication -ApplicationName fabric:/IbeMondayBlue -ApplicationTypeName IbeType -ApplicationTypeVersion 1840.0 -ApplicationParameter $params
(2) Start-ServiceFabricApplicationUpgrade -ApplicationName fabric:/IbeMondayBlue -ApplicationTypeVersion 1841.0 -UnmonitoredManual -ApplicationParameter $params
Then the only change you made is to the instance name of the application and was able to get the upgrade started with the following two cmdlets
(1) New-ServiceFabricApplication -ApplicationName fabric:/IbeMondayBlueAlt -ApplicationTypeName IbeType -ApplicationTypeVersion 1840.0 -ApplicationParameter $params
(2) Start-ServiceFabricApplicationUpgrade -ApplicationName fabric:/IbeMondayBlueAlt -ApplicationTypeVersion 1841.0 -UnmonitoredManual -ApplicationParameter $params
Is that correct, specifically you did not change anything except for the instance name?
Also could you please share the 1840.0 and 1841.0 application manifests and the ApplicationParameters ($params) that you are passing in?
Can you reproduce the exception with the instance name fabric:/IbeMondayBlue, if yes, could you use -Verbose switch in the Start-ServiceFabricApplicationUpgrade cmdlet so that we may get more details on the exception?
Thanks, Tanvir
Hi @motanv,
The fabric:/IbeMondayBlue
application was instantiated many versions ago. It was not newly created at version 1840.0, it was upgraded from another version to get to 1840.0.
And indeed, the fabric:/IbeMondayBlueAlt
version was newly instantiated at version 1840.0.
These are the application manifest versions and parameters file.
And this is the output of the Start-ServiceFabricApplicationUpgrade
command with the -Verbose
flag:
Start-ServiceFabricApplicationUpgrade -ApplicationName fabric:/IbeMondayBlue -ApplicationTypeVersion 1841.0 -ApplicationParameter $params -UnmonitoredManual -Verbose
VERBOSE: System.Fabric.FabricException: aka.ms/upgrade-defaultservices ---> System.Runtime.InteropServices.COMException: Exception from HRESULT: 0x80071BDF
at System.Fabric.Interop.NativeClient.IFabricApplicationManagementClient10.EndUpgradeApplication(IFabricAsyncOperationContext context)
at System.Fabric.Interop.Utility.<>c__DisplayClass22_0.<WrapNativeAsyncInvoke>b__0(IFabricAsyncOperationContext context)
at System.Fabric.Interop.AsyncCallOutAdapter2`1.Finish(IFabricAsyncOperationContext context, Boolean expectedCompletedSynchronously)
--- End of inner exception stack trace ---
VERBOSE: System.Management.Automation.PipelineStoppedException: The pipeline has been stopped.
at System.Management.Automation.MshCommandRuntime.ThrowTerminatingError(ErrorRecord errorRecord)
at System.Management.Automation.Cmdlet.ThrowTerminatingError(ErrorRecord errorRecord)
at Microsoft.ServiceFabric.Powershell.ApplicationCmdletBase.<>c__DisplayClass10_0.<UpgradeApplication>b__0(Exception ae)
at System.AggregateException.Handle(Func`2 predicate)
at Microsoft.ServiceFabric.Powershell.ApplicationCmdletBase.UpgradeApplication(ApplicationUpgradeDescription upgradeDescription)
at Microsoft.ServiceFabric.Powershell.StartApplicationUpgrade.ProcessRecord()
Start-ServiceFabricApplicationUpgrade : aka.ms/upgrade-defaultservices
At line:1 char:5
+ Start-ServiceFabricApplicationUpgrade -ApplicationName fabric:/Ib ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidOperation: (Microsoft.Servi...usterConnection:ClusterConnection) [Start-ServiceFabricApplicationUpgrade], FabricException
+ FullyQualifiedErrorId : UpgradeApplicationErrorId,Microsoft.ServiceFabric.Powershell.StartApplicationUpgrade
Thanks for looking into this,
Jef Schraag
@JefSchraag, thanks for sharing the application manifests. Between the two versions of the application, was there any change in any of the service manifests belonging to one of the default services listed in the application manifest?
@JefSchraag, the reason for asking about the service manifests is that recently there was an issue where customer did not think he changed the default service description because he did not touch the application manifest, but he did modify the placement constraints in the service manifest (for the default service). If you did not change the service manifests either, then I would need the Service Fabric traces to root cause the issue and provide a mitigation.
Hi @motanv , the only service manifest change between 1840 and 1841 is a version number change in the version number on the <ServiceManifest>
element and its child <CodePackage>
.
A summary of what is going on.
case | old | new | outcome |
---|---|---|---|
old instance was upgrade from an even older instance | 1840.0 | 1841.0 | Start-ServiceFabricApplicationUpgrade fails without clear reason |
old instance created from scratch | 1840.0 | 1841.0 | Start-ServiceFabricApplicationUpgrade runs as expected |
Hello @JefSchraag , Could you please share the sf logs with us ? If this is a Windows one-box cluster, you may find them in “Traces” folder like C:\SfDevCluster\Log\Trace and they look like “fabric_traces_6.3.187.9494_131848675483647056_0.etl”
C:\Users\motanv>reg query "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Service Fabric"
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Service Fabric FabricRoot REG_EXPAND_SZ C:\Program Files\Microsoft Service Fabric FabricBinRoot REG_EXPAND_SZ C:\Program Files\Microsoft Service Fabric\bin FabricCodePath REG_EXPAND_SZ C:\Program Files\Microsoft Service Fabric\bin\Fabric\Fabric.Code FabricDataRoot REG_EXPAND_SZ C:\SfDevCluster\Data FabricLogRoot REG_EXPAND_SZ C:\SfDevCluster\Log FabricVersion REG_EXPAND_SZ 6.3.187.9494 CompatibilityJsonPath REG_EXPAND_SZ C:\Program Files\Microsoft Service Fabric\bin\fabric\fabric.code\SdKRuntimeCompatibility.json UseFabricInstallerSvc REG_DWORD 0x1 EnableCircularTraceSession REG_DWORD 0x1 NodeLastBootUpTime REG_SZ 10/19/2018 5:27:54 PM FabricDnsServerIPAddress REG_SZ 10.31.60.134
What is the Service Fabric runtime version that you are using?
Thanks, Tanvir
Hi Tanvir,
The logs you requested are available here: https://1drv.ms/u/s!Am3hNHZWyp0auvVjCdK1vpNkX1WJEQ
I took the logs from the machine in the cluster that acted as the gateway (GatewayInformation data from Connect-ServiceFabricCluster)
I issued the Start-ServiceFabricApplicationUpgrade command at:
Jef
Thanks @JefSchraag, looks like you are on SF runtime 6.3 CU1 (6.3.176) I am actually more interested in the traces from the node where the system service ClusterManager (CM) primary lives, the traces you shared does not looks like from the node where CM primary was.
If this is a cluster you created from portal.azure.com (or you have deployed via ARM template) you could get the Cluster Resouce ID and region of the cluster from the portal.azure.com and share with me so that I can create a SAS (Shared Access Signature) key.
Hi Tanvir,
The logs from the machine hosting the primary CM are available here: https://1drv.ms/u/s!Am3hNHZWyp0auvV6b2sEZe48o33y9A
I noticed this warning in those logs:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
<Provider Name="Microsoft-ServiceFabric" Guid="{cbd93bc2-71e5-4566-b3a7-595d8eeca6e8}" />
<EventID>29441</EventID>
<Version>1</Version>
<Level>3</Level>
<Task>115</Task>
<Opcode>0</Opcode>
<Keywords>0x8000000000000001</Keywords>
<TimeCreated SystemTime="2018-11-02T07:17:57.811704500Z" />
<EventRecordID>170323</EventRecordID>
<Correlation />
<Execution ProcessID="60704" ThreadID="34108" ProcessorID="0" KernelTime="6581" UserTime="90975" />
<Channel>
</Channel>
<Computer>NTA-WS-15.netmatch.local</Computer>
<Security />
</System>
<EventData>
<Data Name="id">
</Data>
<Data Name="type">ApplicationUpgradeContext</Data>
<Data Name="text">[(00000000-0000-0000-0000-000000002000:131611764421385883)+7ca7339e-6cd2-4010-a78f-190e6e5ff734:0] modified default service: fabric:/IbeMondayBlue/CatalogResolver ():IbeType_App390:CatalogResolverPkg:CatalogResolverType@fabric:/IbeMondayBlue [SharedProcess] (1.0:1.0:0) [1]/3/3 SP 0:0 () 0 () 1 () false : (1, -1) (c7fcc222-90c7-43d6-8df0-d42e07aca8a8) - [0,0] () -> fabric:/IbeMondayBlue/CatalogResolver ():IbeType_App390:CatalogResolverPkg:CatalogResolverType@fabric:/IbeMondayBlue [SharedProcess] (1.0:1.0:0) [1]/3/3 SP 0:0 () 0 () 0 () false : (1, -1) (4bbc3a9b-71e4-4be5-9823-1ffc39fb476c) - [0,0] ()</Data>
</EventData>
</Event>
Looking at the log text, it looks like there is something misaligned for the IbeMondayBlue/CatalogResolver
service.
fabric:/IbeMondayBlue/CatalogResolver ():IbeType_App390:CatalogResolverPkg:CatalogResolverType@fabric:/IbeMondayBlue [SharedProcess] (1.0:1.0:0) [1]/3/3 SP 0:0 () 0 () 1 () false : (1, -1) (c7fcc222-90c7-43d6-8df0-d42e07aca8a8) - [0,0] () ->
fabric:/IbeMondayBlue/CatalogResolver ():IbeType_App390:CatalogResolverPkg:CatalogResolverType@fabric:/IbeMondayBlue [SharedProcess] (1.0:1.0:0) [1]/3/3 SP 0:0 () 0 () 0 () false : (1, -1) (4bbc3a9b-71e4-4be5-9823-1ffc39fb476c) - [0,0] ()
I don't know the meaning of the different values.
fabric:/IbeMondayBlue/CatalogResolver ():IbeType_App390:CatalogResolverPkg:CatalogResolverType@fabric:/IbeMondayBlue [SharedProcess] (1.0:1.0:0) [1]/3/3 SP 0:0 () 0 () 0 () false : (1, -1) (4bbc3a9b-71e4-4be5-9823-1ffc39fb476c) - [0,0] ()
Thanks Jef, yes this is the trace I was looking for, it appears that DefaultMoveCost (https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-cluster-resource-manager-movement-cost) has changed (from 1 to 0) for the service fabric:/IbeMondayBlue/CatalogResolver, could you check if you are changing (or removing) it either in application/service manifest or programmatically?
Hi Tanvir,
I checked and we have not changed any DefaultMoveCost
setting. I would not even know how to do that in the application manifests default services.
Can it be that this setting was automatically applied by a system service?
Thanks Jef, could you also share the traces from the FM primary node?
Also, if you do not mind the decrease in DefaultMoveCost, you can set EnableDefaultServicesUpgrade to true (https://aka.ms/upgrade-defaultservices) in the cluster, afterwards the app upgrade should not fail. We still need to find why the DefaultMoveCost changed though, so would need traces from the FM primary node.
Hello @JefSchraag, since we have not heard from you for some time, I am closing this issue. If you still face the issue, please open a new issue.
Hi @motanv,
I am still trying to find some time to make the upgrade work.
However, one of the issues that made this problem larger than needed, is that the exception message in the Start-ServiceFabricApplicationUpgrade
command was not helpful at all. This has cost us a lot of time. It would be very helpful if the exception message at least made a clear statement that the upgrade is failing due to default service settings and perhaps refer to the logs for more details.
Thanks,
Jef Schraag
@JefSchraag , we have improved the error message (we had a bug and the message was truncated due to an unsupported character). The change will be in our 6.4 release.
While trying to upgrade my application using the
Start-ServiceFabricApplicationUpgrade
, I get this exception:The URL
aka.ms/upgrade-defaultservices
takes me to a general article mentioning which service parameter changes are supported.The problem is that I have not changed any default service in this upgrade. Also, the application parameters I am passing as the -ApplicationParameter argument, are exactly the same as the current parameters (checked via
Get-ServiceFabricApplication -ApplicationName fabric:/IbeMondayBlue
)