microsoft / service-fabric-issues

This repo is for the reporting of issues found with Azure Service Fabric.
168 stars 21 forks source link

'System.Replicator' reported Warning for property 'RemoteReplicatorConnectionStatus'. Replica 132296199785740536 cannot be reached to start the copy process. Error Code: CannotConnect, Target listen address: localhost:49934/1ca5f6c0-be88-4d1e-8c17-7d6be067800a-132296199785740536;ebdb39d4-b3db-42c0-9349-b3ab4c5b07f7:19e07e4c94e75af5a4be977c55bd4164. Verify that ReplicatorAddress config is valid. #1657

Closed PTC-JoshuaMatthews closed 4 years ago

PTC-JoshuaMatthews commented 4 years ago

I have a stateful service that can be deployed to my local 1 or 5 node cluster without error, but throws the following error when I deploy it to an Azure cluster

'System.Replicator' reported Warning for property 'RemoteReplicatorConnectionStatus'. Replica 132296199785740536 cannot be reached to start the copy process. Error Code: CannotConnect, Target listen address: localhost:49934/1ca5f6c0-be88-4d1e-8c17-7d6be067800a-132296199785740536;ebdb39d4-b3db-42c0-9349-b3ab4c5b07f7:19e07e4c94e75af5a4be977c55bd4164. Verify that ReplicatorAddress config is valid.

My replicator endpoint configuration is not modified from the default template.

The Secondary replicas are stuck InBuild, presumably because they are waiting for the primary to initiate the replication that it is failing to initiate.

Expected Behavior

All replicas should become ready as happens on my local cluster

Current Behavior

Secondary replicas are stuck InBuild, Primary replica cannot initiate replication.

Steps to Reproduce

1. 2. 3. 4.

Context (Environment)

Service Fabric Runtime and SDK Version :

7.0.470.9590

Operating System :

Windows server

Cluster Size : 5 node azure prod cluster

Possible Workaround

No

PTC-JoshuaMatthews commented 4 years ago

Figured out what was going on

While I wasn't intentionally messing with the replicator endpoint, I was overriding some replicator setting with some code like

public MyStateFulService(StatefulServiceContext context) : base(context, new ReliableStateManager(context, new ReliableStateManagerConfiguration(new ReliableStateManagerReplicatorSettings { MaxReplicationMessageSize = 1073741824 }))){ }

Apparently this was overriding the default settings for the endpoint. I moved these settings to config and it seems to be working now.