microsoft / service-fabric-cli

Service Fabric CLI Tools
Other
53 stars 52 forks source link

sfctl partition restart return (FMFailoverUnitNotFound) Null #189

Open sealbus opened 5 years ago

sealbus commented 5 years ago

I'm trying to restart a partition using sfctl partition restart command. This is not working as expected, here the debug result:

Command arguments: ['partition', 'restart', '--restart-partition-mode', 'AllReplicasOrInstances', '--service-id', 'IOTApplication/RedisService', '--partition-id', 'b09fb219-248c-244b-b298-27a0db25c053', '--operation-id', '0ebed82a-e90f-470a-aba9-8caf62ae5840', '--debug'] Event: Cli.PreExecute [] Event: CommandParser.OnGlobalArgumentsCreate [<function CLILogging.on_global_arguments at 0x0000017F8E46D1E0>, <function OutputProducer.on_global_arguments at 0x0000017F8E542268>, <function CLIQuery.on_global_arguments at 0x0000017F8E569D08>] Event: CommandInvoker.OnPreCommandTableCreate [] Event: CommandLoader.OnLoadArguments [] Event: CommandInvoker.OnPostCommandTableCreate [] Event: CommandInvoker.OnCommandTableLoaded [] Event: CommandInvoker.OnPreParseArgs [] Event: CommandInvoker.OnPostParseArgs [<function OutputProducer.handle_output_argument at 0x0000017F8E5422F0>, <function CLIQuery.handle_query_parameter at 0x0000017F8E569D90>] msrest.service_client : Accept header absent and forced to application/json msrest.pipeline : Configuring request: timeout=100, verify=True, cert=None msrest.pipeline : Configuring proxies: '' msrest.pipeline : Evaluate proxies against ENV settings: True msrest.pipeline : Configuring redirects: allow=True, max=30 msrest.pipeline : Configuring retry: max_retries=False, backoff_factor=0.8, max_backoff=90 urllib3.connectionpool : Starting new HTTP connection (1): localhost:19081 urllib3.connectionpool : http://localhost:19081 "POST /Faults/Services/IOTApplication/RedisService/$/GetPartitions/b09fb219-248c-244b-b298-27a0db25c053/$/StartRestart?api-version=6.0&OperationId=0ebed82a-e90f-470a-aba9-8caf62ae5840&RestartPartitionMode=AllReplicasOrInstances&timeout=60 HTTP/1.1" 500 60 msrest.exceptions : (FMFailoverUnitNotFound) Null (FMFailoverUnitNotFound) Null Traceback (most recent call last): File "c:\users\xxxx\appdata\local\programs\python\python36\lib\site-packages\knack\cli.py", line 206, in invoke cmd_result = self.invocation.execute(args) File "c:\users\xxxx\appdata\local\programs\python\python36\lib\site-packages\sfctl\entry.py", line 81, in execute return super(SFInvoker, self).execute(args) File "c:\users\xxxx\appdata\local\programs\python\python36\lib\site-packages\knack\invocation.py", line 188, in execute cmd_result = parsed_args.func(params) File "c:\users\xxxx\appdata\local\programs\python\python36\lib\site-packages\knack\commands.py", line 105, in __call__ return self.handler(*args, **kwargs) File "c:\users\xxxx\appdata\local\programs\python\python36\lib\site-packages\knack\commands.py", line 212, in _command_handler result = op(client, **command_args) if client else op(**command_args) File "c:\users\xxxx\appdata\local\programs\python\python36\lib\site-packages\azure\servicefabric\service_fabric_client_ap_is.py", line 11507, in start_partition_restart raise models.FabricErrorException(self._deserialize, response) azure.servicefabric.models.fabric_error_py3.FabricErrorException: (FMFailoverUnitNotFound) Null Performing cluster version check msrest.pipeline : Configuring request: timeout=100, verify=True, cert=None msrest.pipeline : Configuring proxies: '' msrest.pipeline : Evaluate proxies against ENV settings: True msrest.pipeline : Configuring redirects: allow=True, max=30 msrest.pipeline : Configuring retry: max_retries=3, backoff_factor=0.8, max_backoff=90 urllib3.connectionpool : Starting new HTTP connection (1): localhost:19081 urllib3.connectionpool : http://localhost:19081 "GET /$/GetClusterVersion?api-version=6.4&timeout=60 HTTP/1.1" 200 23

sfctl version: 7.1.0 Service Fabric 6.4 runtime

Christina-Kang commented 5 years ago

Thank you for reporting this error! We will take a look and get back soon. In the mean time, since you are on Windows, you can consider trying the PowerShell client to unblock you.

sealbus commented 5 years ago

Thanks for you reply, i execute sfctl on windows environment, but my Service Fabric Cluster is on Linux... According to https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-linux-windows-differences Restart-ServiceFabricPartition not work against a Linux Service Fabric cluster.

Christina-Kang commented 5 years ago

Hi @sealbus,

This doesn't look like an issue with sfctl itself. Could you please share service fabric traces for your cluster? They are located at C:\SfDevCluster\Log\Traces and will have a naming starting with fabric_traces_... along with an approximate time frame of when the operation took place.

Is this a secure local cluster?

Another thing I wanted to double check is the port. I see that yours is set to 19081 - is this intentionally done? By default, we expect it to be 19080 if you are using all default settings.

Thanks!

MZDN commented 5 years ago

same issue when run command: sudo sfctl chaos get

Christina-Kang commented 5 years ago

Thank you, @MZDN for reporting! Taking a look

Christina-Kang commented 5 years ago

@MZDN can you share some additional info please? Are you also running sfctl version 7.1.0 and runtime 6.4? Which Python version are you using? Can you also double check that fault analysis service is enabled on your cluster? This will be in the cluster manifest as Section Name="FaultAnalysisService" with parameters MinReplicaSetSizeand TargetReplicaSetSize. The section will be under FabricSettings. If not enabled, can you enable it and try the get command again? Thanks!