Closed ifle closed 6 years ago
We need more info to help address this issue. A few ideas about what to add to the issue:
For the scenario, SF runs the entry-point as specified in the manifest, if for some reason the process does not come up (for SF to monitor), the timeout might occur. What causes the guest exe to not start on the node could be a lot of things, from missing dependencies on the node to permission issues etc.
Thanks for your response. SF version 6.3.176.9494 SKU: Datacenter-Core-1709-with-Containers-smalldisk, version: latest I have 2 node types
On Primary node (Standard_DS11_v2) deployed 3 applications:
What is interesting is that after several time the SF shows that everything is fine, but one node has service that can't to load.
On Standard_D2_v2 deployed 2 web asp.net applications as container
This behavior I observe with any service or container after deploy
Thanks for the detailed info :-)
It's probably a matter of that code package is not starting on the node. Aka the process or container cannot start in a timely fashion. Best troubleshooting step is to remote in to the node and see what causes t to not start (Look at Event Logs or try starting the process/container manually and get more insights.).
Thanks i will try to starting manually. I not found how to open the event log on Datacenter-Core-1709. There is not the UI, command line only.
Run this from the cmd: eventvwr.msc (I think it will work)
I have 'eventvwr.msc' is not recognized as an internal or external command
Using of CopyPackageTimeoutSec fixed this issue. The error message is inaccurate. The timeout error occurred during copying package and not activation. @MikkelHegn Thanks for your help.
Great glad you got it working :-)
@MikkelHegn Unfortunately the error is back. I connected remotely to the node and tried start the process manually. The process started as expected. What is wrong? How to troubleshoot this issue? Please change the status to Open
I dumped eventlog from a clean cluster where there was only deployed one application that experienced the same thing.
https://transfer.sh/820cr/eventlog.xml
I resolved it right now by restarting the node - but thats offcause not a viable solution.
I dump all event log errors. Can you please to look?
TimeCreated Message
----------- -------
9/11/2018 7:33:09 AM Faulting application name: Fabric.exe, version: 6.3.176.9494, time stamp: 0x5b6e5929
Faulting module name: Fabric.exe, version: 6.3.176.9494, time stamp: 0x5b6e5929
Exception code: 0xc0000602
Fault offset: 0x000000000012cded
Faulting process id: 0xb58
Faulting application start time: 0x01d4499f54d52d7b
Faulting application path: C:\Program Files\Microsoft Service
Fabric\bin\Fabric\Fabric.Code\Fabric.exe
Faulting module path: C:\Program Files\Microsoft Service Fabric\bin\Fabric\Fabric.Code\Fabric.exe
Report Id: aee1b804-050a-4bf6-abf4-d121d2bef91d
Faulting package full name:
Faulting package-relative application ID:
9/11/2018 7:16:15 AM Faulting application name: Fabric.exe, version: 6.3.176.9494, time stamp: 0x5b6e5929
Faulting module name: Fabric.exe, version: 6.3.176.9494, time stamp: 0x5b6e5929
Exception code: 0xc0000602
Fault offset: 0x000000000012cded
Faulting process id: 0x19dc
Faulting application start time: 0x01d4499d00115871
Faulting application path: C:\Program Files\Microsoft Service
Fabric\bin\Fabric\Fabric.Code\Fabric.exe
Faulting module path: C:\Program Files\Microsoft Service Fabric\bin\Fabric\Fabric.Code\Fabric.exe
Report Id: 2f2105c8-a9d9-44ee-8461-dbcbbb0f610b
Faulting package full name:
Faulting package-relative application ID:
9/11/2018 6:59:34 AM Faulting application name: Fabric.exe, version: 6.3.176.9494, time stamp: 0x5b6e5929
Faulting module name: Fabric.exe, version: 6.3.176.9494, time stamp: 0x5b6e5929
Exception code: 0xc0000602
Fault offset: 0x000000000012cded
Faulting process id: 0xc58
Faulting application start time: 0x01d4499aacfc0200
Faulting application path: C:\Program Files\Microsoft Service
Fabric\bin\Fabric\Fabric.Code\Fabric.exe
Faulting module path: C:\Program Files\Microsoft Service Fabric\bin\Fabric\Fabric.Code\Fabric.exe
Report Id: a51e4638-4fab-4028-805a-826f4385e4a4
Faulting package full name:
Faulting package-relative application ID:
9/11/2018 6:42:56 AM Faulting application name: Fabric.exe, version: 6.3.176.9494, time stamp: 0x5b6e5929
Faulting module name: Fabric.exe, version: 6.3.176.9494, time stamp: 0x5b6e5929
Exception code: 0xc0000602
Fault offset: 0x000000000012cded
Faulting process id: 0xf48
Faulting application start time: 0x01d4499858400c8e
Faulting application path: C:\Program Files\Microsoft Service
Fabric\bin\Fabric\Fabric.Code\Fabric.exe
Faulting module path: C:\Program Files\Microsoft Service Fabric\bin\Fabric\Fabric.Code\Fabric.exe
Report Id: f0863ed7-09f3-49e2-b0f9-dfdde9c4c298
Faulting package full name:
Faulting package-relative application ID:
9/11/2018 6:26:15 AM Faulting application name: Fabric.exe, version: 6.3.176.9494, time stamp: 0x5b6e5929
Faulting module name: Fabric.exe, version: 6.3.176.9494, time stamp: 0x5b6e5929
Exception code: 0xc0000602
Fault offset: 0x000000000012cded
Faulting process id: 0x17bc
Faulting application start time: 0x01d44996050174ee
Faulting application path: C:\Program Files\Microsoft Service
Fabric\bin\Fabric\Fabric.Code\Fabric.exe
Faulting module path: C:\Program Files\Microsoft Service Fabric\bin\Fabric\Fabric.Code\Fabric.exe
Report Id: f287439b-c898-491e-be1e-0fcca90e147f
Faulting package full name:
Faulting package-relative application ID:
9/11/2018 6:09:36 AM Faulting application name: Fabric.exe, version: 6.3.176.9494, time stamp: 0x5b6e5929
Faulting module name: Fabric.exe, version: 6.3.176.9494, time stamp: 0x5b6e5929
Exception code: 0xc0000602
Fault offset: 0x000000000012cded
Faulting process id: 0x1274
Faulting application start time: 0x01d44993b0a8acf8
Faulting application path: C:\Program Files\Microsoft Service
Fabric\bin\Fabric\Fabric.Code\Fabric.exe
Faulting module path: C:\Program Files\Microsoft Service Fabric\bin\Fabric\Fabric.Code\Fabric.exe
Report Id: 779737ce-c57c-4c7c-999e-4245f0413356
Faulting package full name:
Faulting package-relative application ID:
9/11/2018 5:52:54 AM Faulting application name: Fabric.exe, version: 6.3.176.9494, time stamp: 0x5b6e5929
Faulting module name: Fabric.exe, version: 6.3.176.9494, time stamp: 0x5b6e5929
Exception code: 0xc0000602
Fault offset: 0x000000000012cded
Faulting process id: 0x92c
Faulting application start time: 0x01d449574652b40b
Faulting application path: C:\Program Files\Microsoft Service
Fabric\bin\Fabric\Fabric.Code\Fabric.exe
Faulting module path: C:\Program Files\Microsoft Service Fabric\bin\Fabric\Fabric.Code\Fabric.exe
Report Id: e1d06d08-a024-4871-a3af-210a1aa840d3
Faulting package full name:
Faulting package-relative application ID:
9/10/2018 10:40:30 PM The Open Procedure for service "WmiApRpl" in DLL "C:\Windows\system32\wbem\wmiaprpl.dll" failed.
Performance data for this service will not be available. The first four bytes (DWORD) of the
Data section contains the error code.
9/10/2018 10:40:30 PM Unable to open the Server service performance object. The first four bytes (DWORD) of the Data
section contains the status code.
9/10/2018 10:40:30 PM The Open Procedure for service "Lsa" in DLL "C:\Windows\System32\Secur32.dll" failed.
Performance data for this service will not be available. The first four bytes (DWORD) of the
Data section contains the error code.
9/10/2018 10:40:30 PM The Open Procedure for service ".NETFramework" in DLL "C:\Windows\system32\mscoree.dll" failed.
Performance data for this service will not be available. The first four bytes (DWORD) of the
Data section contains the error code.
9/10/2018 10:40:25 PM Resolver Setup/Start failed for container servicefabric_network, "error in opening name server
socket listen udp 10.0.0.1:53: bind: The requested address is not valid in its context."
9/10/2018 10:40:24 PM Resolver Setup/Start failed for container servicefabric_network, "error in opening name server
socket listen udp 10.0.0.1:53: bind: The requested address is not valid in its context."
9/10/2018 10:40:23 PM Resolver Setup/Start failed for container servicefabric_network, "error in opening name server
socket listen udp 10.0.0.1:53: bind: The requested address is not valid in its context."
9/10/2018 10:39:28 PM Resolver Setup/Start failed for container servicefabric_network, "error in opening name server
socket listen udp 10.0.0.1:53: bind: The requested address is not valid in its context."
9/10/2018 10:39:27 PM Resolver Setup/Start failed for container servicefabric_network, "error in opening name server
socket listen udp 10.0.0.1:53: bind: The requested address is not valid in its context."
9/10/2018 10:39:26 PM Resolver Setup/Start failed for container servicefabric_network, "error in opening name server
socket listen udp 10.0.0.1:53: bind: The requested address is not valid in its context."
9/10/2018 10:23:40 PM Faulting application name: Fabric.exe, version: 6.3.176.9494, time stamp: 0x5b6e5929
Faulting module name: Fabric.exe, version: 6.3.176.9494, time stamp: 0x5b6e5929
Exception code: 0xc0000602
Fault offset: 0x000000000012cded
Faulting process id: 0x1268
Faulting application start time: 0x01d4495299b25dc6
Faulting application path: C:\Program Files\Microsoft Service
Fabric\bin\Fabric\Fabric.Code\Fabric.exe
Faulting module path: C:\Program Files\Microsoft Service Fabric\bin\Fabric\Fabric.Code\Fabric.exe
Report Id: 2aa1403c-c649-497f-bc8f-9427604b6031
Faulting package full name:
Faulting package-relative application ID:
9/10/2018 10:07:00 PM Faulting application name: Fabric.exe, version: 6.3.176.9494, time stamp: 0x5b6e5929
Faulting module name: Fabric.exe, version: 6.3.176.9494, time stamp: 0x5b6e5929
Exception code: 0xc0000602
Fault offset: 0x000000000012cded
Faulting process id: 0x129c
Faulting application start time: 0x01d4495042a8ec3f
Faulting application path: C:\Program Files\Microsoft Service
Fabric\bin\Fabric\Fabric.Code\Fabric.exe
Faulting module path: C:\Program Files\Microsoft Service Fabric\bin\Fabric\Fabric.Code\Fabric.exe
Report Id: a7a02f62-611f-4808-a106-ce8e08ed17ea
Faulting package full name:
Faulting package-relative application ID:
9/10/2018 9:50:15 PM Faulting application name: Fabric.exe, version: 6.3.176.9494, time stamp: 0x5b6e5929
Faulting module name: Fabric.exe, version: 6.3.176.9494, time stamp: 0x5b6e5929
Exception code: 0xc0000602
Fault offset: 0x000000000012cded
Faulting process id: 0x11b4
Faulting application start time: 0x01d4494ded1d9233
Faulting application path: C:\Program Files\Microsoft Service
Fabric\bin\Fabric\Fabric.Code\Fabric.exe
Faulting module path: C:\Program Files\Microsoft Service Fabric\bin\Fabric\Fabric.Code\Fabric.exe
Report Id: 9467b7e1-9546-4434-a0f1-eb2a19835008
Faulting package full name:
Faulting package-relative application ID:
9/10/2018 9:33:32 PM Faulting application name: Fabric.exe, version: 6.3.176.9494, time stamp: 0x5b6e5929
Faulting module name: Fabric.exe, version: 6.3.176.9494, time stamp: 0x5b6e5929
Exception code: 0xc0000602
Fault offset: 0x000000000012cded
Faulting process id: 0x1300
Faulting application start time: 0x01d4494b961adcc9
Faulting application path: C:\Program Files\Microsoft Service
Fabric\bin\Fabric\Fabric.Code\Fabric.exe
Faulting module path: C:\Program Files\Microsoft Service Fabric\bin\Fabric\Fabric.Code\Fabric.exe
Report Id: 2f6c4138-2eb0-42ca-a0ec-269f1a79ee15
Faulting package full name:
Faulting package-relative application ID:
9/10/2018 9:16:47 PM Faulting application name: Fabric.exe, version: 6.3.176.9494, time stamp: 0x5b6e5929
Faulting module name: Fabric.exe, version: 6.3.176.9494, time stamp: 0x5b6e5929
Exception code: 0xc0000602
Fault offset: 0x000000000012cded
Faulting process id: 0x1828
Faulting application start time: 0x01d4494944aa4160
Faulting application path: C:\Program Files\Microsoft Service
Fabric\bin\Fabric\Fabric.Code\Fabric.exe
Faulting module path: C:\Program Files\Microsoft Service Fabric\bin\Fabric\Fabric.Code\Fabric.exe
Report Id: 62902396-c095-4427-9049-1f398bdaf2b8
Faulting package full name:
Faulting package-relative application ID:
9/10/2018 9:00:12 PM Faulting application name: Fabric.exe, version: 6.3.176.9494, time stamp: 0x5b6e5929
Faulting module name: Fabric.exe, version: 6.3.176.9494, time stamp: 0x5b6e5929
Exception code: 0xc0000602
Fault offset: 0x000000000012cded
Faulting process id: 0xc24
Faulting application start time: 0x01d44946efa4033c
Faulting application path: C:\Program Files\Microsoft Service
Fabric\bin\Fabric\Fabric.Code\Fabric.exe
Faulting module path: C:\Program Files\Microsoft Service Fabric\bin\Fabric\Fabric.Code\Fabric.exe
Report Id: f90a5bae-f944-4bd5-82f3-0df158b4212e
Faulting package full name:
Faulting package-relative application ID:
9/10/2018 8:43:30 PM Faulting application name: Fabric.exe, version: 6.3.176.9494, time stamp: 0x5b6e5929
Faulting module name: Fabric.exe, version: 6.3.176.9494, time stamp: 0x5b6e5929
Exception code: 0xc0000602
Fault offset: 0x000000000012cded
Faulting process id: 0x1284
Faulting application start time: 0x01d4494498b76c89
Faulting application path: C:\Program Files\Microsoft Service
Fabric\bin\Fabric\Fabric.Code\Fabric.exe
Faulting module path: C:\Program Files\Microsoft Service Fabric\bin\Fabric\Fabric.Code\Fabric.exe
Report Id: 255591f0-016c-4066-acec-c3aeb6adc0bf
Faulting package full name:
Faulting package-relative application ID:
9/10/2018 8:26:45 PM Faulting application name: Fabric.exe, version: 6.3.176.9494, time stamp: 0x5b6e5929
Faulting module name: Fabric.exe, version: 6.3.176.9494, time stamp: 0x5b6e5929
Exception code: 0xc0000602
Fault offset: 0x000000000012cded
Faulting process id: 0x14dc
Faulting application start time: 0x01d44942466ee2a9
Faulting application path: C:\Program Files\Microsoft Service
Fabric\bin\Fabric\Fabric.Code\Fabric.exe
Faulting module path: C:\Program Files\Microsoft Service Fabric\bin\Fabric\Fabric.Code\Fabric.exe
Report Id: dd25042f-059b-4752-8d9b-5a3935a92ea4
Faulting package full name:
Faulting package-relative application ID:
9/10/2018 8:10:08 PM Faulting application name: Fabric.exe, version: 6.3.176.9494, time stamp: 0x5b6e5929
Faulting module name: Fabric.exe, version: 6.3.176.9494, time stamp: 0x5b6e5929
Exception code: 0xc0000602
Fault offset: 0x000000000012cded
Faulting process id: 0x1410
Faulting application start time: 0x01d4493ff0696a20
Faulting application path: C:\Program Files\Microsoft Service
Fabric\bin\Fabric\Fabric.Code\Fabric.exe
Faulting module path: C:\Program Files\Microsoft Service Fabric\bin\Fabric\Fabric.Code\Fabric.exe
Report Id: 80f4078e-d561-46d4-bb4b-8ea8accc6a12
Faulting package full name:
Faulting package-relative application ID:
9/10/2018 7:53:25 PM Faulting application name: Fabric.exe, version: 6.3.176.9494, time stamp: 0x5b6e5929
Faulting module name: Fabric.exe, version: 6.3.176.9494, time stamp: 0x5b6e5929
Exception code: 0xc0000602
Fault offset: 0x000000000012cded
Faulting process id: 0x127c
Faulting application start time: 0x01d4493d9cee6f54
Faulting application path: C:\Program Files\Microsoft Service
Fabric\bin\Fabric\Fabric.Code\Fabric.exe
Faulting module path: C:\Program Files\Microsoft Service Fabric\bin\Fabric\Fabric.Code\Fabric.exe
Report Id: 315580b6-2025-4348-b53a-9cf60bbd1dad
Faulting package full name:
Faulting package-relative application ID:
9/10/2018 7:36:38 PM Faulting application name: Fabric.exe, version: 6.3.176.9494, time stamp: 0x5b6e5929
Faulting module name: Fabric.exe, version: 6.3.176.9494, time stamp: 0x5b6e5929
Exception code: 0xc0000602
Fault offset: 0x000000000012cded
Faulting process id: 0x12c8
Faulting application start time: 0x01d44934b161c516
Faulting application path: C:\Program Files\Microsoft Service
Fabric\bin\Fabric\Fabric.Code\Fabric.exe
Faulting module path: C:\Program Files\Microsoft Service Fabric\bin\Fabric\Fabric.Code\Fabric.exe
Report Id: 8a30860d-b79f-4746-8627-47e74c89b106
Faulting package full name:
Faulting package-relative application ID:
9/10/2018 6:32:59 PM The Open Procedure for service "WmiApRpl" in DLL "C:\Windows\system32\wbem\wmiaprpl.dll" failed.
Performance data for this service will not be available. The first four bytes (DWORD) of the
Data section contains the error code.
9/10/2018 6:32:59 PM Unable to open the Server service performance object. The first four bytes (DWORD) of the Data
section contains the status code.
9/10/2018 6:32:59 PM The Open Procedure for service "Lsa" in DLL "C:\Windows\System32\Secur32.dll" failed.
Performance data for this service will not be available. The first four bytes (DWORD) of the
Data section contains the error code.
9/10/2018 6:32:59 PM The Open Procedure for service ".NETFramework" in DLL "C:\Windows\system32\mscoree.dll" failed.
Performance data for this service will not be available. The first four bytes (DWORD) of the
Data section contains the error code.
9/10/2018 6:32:12 PM Resolver Setup/Start failed for container servicefabric_network, "error in opening name server
socket listen udp 10.0.0.1:53: bind: The requested address is not valid in its context."
9/10/2018 6:32:11 PM Resolver Setup/Start failed for container servicefabric_network, "error in opening name server
socket listen udp 10.0.0.1:53: bind: The requested address is not valid in its context."
9/10/2018 6:32:10 PM Resolver Setup/Start failed for container servicefabric_network, "error in opening name server
socket listen udp 10.0.0.1:53: bind: The requested address is not valid in its context."
9/10/2018 6:31:44 PM Windows cannot load the extensible counter DLL ESE. The first four bytes (DWORD) of the Data
section contains the Windows error code.
9/10/2018 6:26:03 PM The Open Procedure for service "BITS" in DLL "C:\Windows\System32\bitsperf.dll" failed.
Performance data for this service will not be available. The first four bytes (DWORD) of the
Data section contains the error code.
I can add that all my apps are native SF apps, stateless/statefull. I failed to find anything in my eventlog telling me why it dont start.
Thanks guys. I see the same exeception code and error in both cases... Investigating...
Thanks @MikkelHegn
I need one of you guys to raise a support ticket through the Azure Portal, so we can take a closer look at the traces from the runtime. Please let me know once you've done it and I will sync up with the on-call engineers.
I will to open
@MikkelHegn Support request ID : 118091118978038
Cool, lets wait and see what you find out from @ifle - then if we need more to look at i can raise one too. But i dont have the problem at the moment - but i do have CI/CD set up so could properly spin up 10 clusters quickly and assume one would fail :D
Initial triage shows that a reboot should work around the issue for those nodes affected. As we have more confirmation on the issue and fix, I'll keep you posted.
Thanks. I tried reboot from SF explorer, but it's not fix the issue.
Restart node from SFX only restarts the Fabric processes. You need to reboot the VM.
How to reboot the VM in the safety way? Anyway that is workaround only.
Use this and make sure to add the instanceId for the vm to restart: https://docs.microsoft.com/en-us/powershell/module/azurerm.compute/restart-azurermvmss?view=azurermps-6.8.1
Thanks
@ifle I normal remote desktop to it and do a powershell and Restart-Computer when thats an option.
One node again affected to this issue :(
This is issue prevent from SF download the new version of docker containers. We can't update our applications.
Fixed by SF team. Thanks
The mitigation is to kill the “FabricHost.exe” process. This issue can get escalated with many deployments or application deletions. We are working on a permanent mitigation in the upcoming CU.
We've also been seeing this, sometimes on a few nodes and sometimes on several, especially when deploying our apps to a fresh cluster.
After some time the bootstrapper server will restart the node and it will work again. curiously it only seems to happen the first time we deploy our apps, if we later redeploy then we don't observe the issue.
Great to hear it will be fixed in the next CU :)
I have guest executable service fabric with 4 nodes. 3 nodes works fine and only one node has this error. I tried recreate the SF, without success, one node has this error