Open pbarron opened 12 years ago
I'm pretty sure that both these methods are assembled server side based on token order as, iirc, it's held in a BiMap regardless of the replication strategy implementation. @pbarron if you have some cycles, that would be the place to look code-wise in Cassandra (AbstractReplicationStrategy and TokenMetadata in o.a.c.locator(?) package - again iirc, but I think that's it). I'll take a look later this evening (gmt-6) if you don't. Thanks again.
The NodeAutoDiscoverService uses the ThriftCluster describeRing method to obtain to contain a list of TokenRanges. This maps back to o.a.c.t.CassandraServer describe_ring method which makes a call to o.a.c.s.StorageServer describeRing method which generates the TokenRanges (see below)
public List<TokenRange> describeRing(String keyspace) throws InvalidRequestException
{
if (keyspace == null || !Schema.instance.getNonSystemTables().contains(keyspace))
throw new InvalidRequestException("There is no ring for the keyspace: " + keyspace);
List<TokenRange> ranges = new ArrayList<TokenRange>();
Token.TokenFactory tf = getPartitioner().getTokenFactory();
for (Map.Entry<Range<Token>, List<InetAddress>> entry : getRangeToAddressMap(keyspace).entrySet())
{
Range range = entry.getKey();
List<InetAddress> addresses = entry.getValue();
List<String> endpoints = new ArrayList<String>(addresses.size());
List<String> rpc_endpoints = new ArrayList<String>(addresses.size());
List<EndpointDetails> epDetails = new ArrayList<EndpointDetails>(addresses.size());
for (InetAddress endpoint : addresses)
{
EndpointDetails details = new EndpointDetails();
details.host = endpoint.getHostAddress();
details.datacenter = DatabaseDescriptor.getEndpointSnitch().getDatacenter(endpoint);
details.rack = DatabaseDescriptor.getEndpointSnitch().getRack(endpoint);
endpoints.add(details.host);
rpc_endpoints.add(getRpcaddress(endpoint));
epDetails.add(details);
}
TokenRange tr = new TokenRange(tf.toString(range.left.getToken()), tf.toString(range.right.getToken()), endpoints)
.setEndpoint_details(epDetails)
.setRpc_endpoints(rpc_endpoints);
ranges.add(tr);
}
return ranges;
}
This would seem to indicate that endpoints, rpc_endpoints and EndpointDetails are in order and such we can depend on it to auto discover nodes.
If you agree I'll submit a change, but it will be monday before I get to do it.
@pbarron I agree with your assesment. Thanks again for looking at this.
Version 1.1-0
Summary: In a cassandra server with multi network interface setup where the rpc_endpoint and listen_address configured to use different interfaces the NodeAutoDiscoverService discovers the listen_address of the cassandra server instead of the rpc_endpoint. The result is that the NodeAutoDiscoverService fails to discover new nodes.
Steps to reproduce: 1.) Configure two cassandra server to have rpc_endpoint and listen_address listening on different interfaces. 2.) Configure the hector client to have autoDiscoverHosts set to true (default is false) 3.) Configure the hector client with a one of the cassandra servers. 4.) Verify via the Hector JMX bean attribute KnowHosts that the NodeAutoDiscoverService does not discover the 2nd server.
Logs:
We believe the reason is due to the NodeAutoDiscoverService using tokenRange.getEndpoint_details() which returns the listen_address rather than the rpc_address. This can be verified via by the logs above.
As the listen_address is used it is always discovered as it will never be in existingHosts which contains the rpc_address (see code below). However, the subsequent connection attempt will fail as it is the wrong interface.
Possible solutions: 1.) Use tokenRange.getRpc_endpoints() instead of tokenRange.getEndpoint_details(). However, it won't be possible to filter out nodes from particular DCs. 2.) An alternative is to use a combination of tokenRange.getRpc_endpoints() and tokenRange.getEndpoint_details() to both filter and obtain the rpc_address. However, this assumes that what is returned by these methods are in the same order and it is not clear that this is the case.