NodeAutoDiscoverService discovering listen_address instead of rpc_address

pbarron commented 12 years ago

Version 1.1-0

Summary: In a cassandra server with multi network interface setup where the rpc_endpoint and listen_address configured to use different interfaces the NodeAutoDiscoverService discovers the listen_address of the cassandra server instead of the rpc_endpoint. The result is that the NodeAutoDiscoverService fails to discover new nodes.

Steps to reproduce: 1.) Configure two cassandra server to have rpc_endpoint and listen_address listening on different interfaces. 2.) Configure the hector client to have autoDiscoverHosts set to true (default is false) 3.) Configure the hector client with a one of the cassandra servers. 4.) Verify via the Hector JMX bean attribute KnowHosts that the NodeAutoDiscoverService does not discover the 2nd server.

Logs:

bfihs4-vm25 2012-07-04 13:01:10,560 [Hector.me.prettyprint.cassandra.connection.NodeAutoDiscoverService-1] DEBUG cassandra.connection.NodeAutoDiscoverService - Auto discovery service run complete.
bfihs4-vm25 2012-07-04 13:01:40,565 [Hector.me.prettyprint.cassandra.connection.NodeAutoDiscoverService-1] DEBUG cassandra.connection.NodeAutoDiscoverService - Auto discovery service running...
bfihs4-vm25 2012-07-04 13:01:40,565 [Hector.me.prettyprint.cassandra.connection.NodeAutoDiscoverService-1] DEBUG cassandra.connection.NodeAutoDiscoverService - using existing hosts [10.160.166.207(10.160.166.207):9160]
bfihs4-vm25 2012-07-04 13:01:40,755 [Hector.me.prettyprint.cassandra.connection.NodeAutoDiscoverService-1] INFO  cassandra.connection.NodeAutoDiscoverService - Found a node we don't know about 10.160.169.204(10.160.169.204):9160 for TokenRange TokenRange(start_token:0, end_token:34028236692093848235284053891034906624, endpoints:[10.160.169.204, 10.160.169.205], rpc_endpoints:[10.160.166.204, 10.160.166.205], endpoint_details:[EndpointDetails(host:10.160.169.204, datacenter:DC1, rack:RAC1), EndpointDetails(host:10.160.169.205, datacenter:DC1, rack:RAC1)])
bfihs4-vm25 2012-07-04 13:01:40,755 [Hector.me.prettyprint.cassandra.connection.NodeAutoDiscoverService-1] INFO  cassandra.connection.NodeAutoDiscoverService - Found a node we don't know about 10.160.169.205(10.160.169.205):9160 for TokenRange TokenRange(start_token:0, end_token:34028236692093848235284053891034906624, endpoints:[10.160.169.204, 10.160.169.205], rpc_endpoints:[10.160.166.204, 10.160.166.205], endpoint_details:[EndpointDetails(host:10.160.169.204, datacenter:DC1, rack:RAC1), EndpointDetails(host:10.160.169.205, datacenter:DC1, rack:RAC1)])
bfihs4-vm25 2012-07-04 13:01:40,756 [Hector.me.prettyprint.cassandra.connection.NodeAutoDiscoverService-1] INFO  cassandra.connection.NodeAutoDiscoverService - Found a node we don't know about 10.160.169.206(10.160.169.206):9160 for TokenRange TokenRange(start_token:68056473384187696470568107782069813248, end_token:102084710076281535261119195933814292480, endpoints:[10.160.169.206, 10.160.169.207], rpc_endpoints:[10.160.166.206, 10.160.166.207], endpoint_details:[EndpointDetails(host:10.160.169.206, datacenter:DC1, rack:RAC1), EndpointDetails(host:10.160.169.207, datacenter:DC1, rack:RAC1)])
bfihs4-vm25 2012-07-04 13:01:40,756 [Hector.me.prettyprint.cassandra.connection.NodeAutoDiscoverService-1] INFO  cassandra.connection.NodeAutoDiscoverService - Found a node we don't know about 10.160.169.207(10.160.169.207):9160 for TokenRange TokenRange(start_token:68056473384187696470568107782069813248, end_token:102084710076281535261119195933814292480, endpoints:[10.160.169.206, 10.160.169.207], rpc_endpoints:[10.160.166.206, 10.160.166.207], endpoint_details:[EndpointDetails(host:10.160.169.206, datacenter:DC1, rack:RAC1), EndpointDetails(host:10.160.169.207, datacenter:DC1, rack:RAC1)])
bfihs4-vm25 2012-07-04 13:01:40,756 [Hector.me.prettyprint.cassandra.connection.NodeAutoDiscoverService-1] INFO  cassandra.connection.NodeAutoDiscoverService - Found a node we don't know about 10.160.169.205(10.160.169.205):9160 for TokenRange TokenRange(start_token:34028236692093848235284053891034906624, end_token:68056473384187696470568107782069813248, endpoints:[10.160.169.205, 10.160.169.206], rpc_endpoints:[10.160.166.205, 10.160.166.206], endpoint_details:[EndpointDetails(host:10.160.169.205, datacenter:DC1, rack:RAC1), EndpointDetails(host:10.160.169.206, datacenter:DC1, rack:RAC1)])
bfihs4-vm25 2012-07-04 13:01:40,756 [Hector.me.prettyprint.cassandra.connection.NodeAutoDiscoverService-1] INFO  cassandra.connection.NodeAutoDiscoverService - Found a node we don't know about 10.160.169.206(10.160.169.206):9160 for TokenRange TokenRange(start_token:34028236692093848235284053891034906624, end_token:68056473384187696470568107782069813248, endpoints:[10.160.169.205, 10.160.169.206], rpc_endpoints:[10.160.166.205, 10.160.166.206], endpoint_details:[EndpointDetails(host:10.160.169.205, datacenter:DC1, rack:RAC1), EndpointDetails(host:10.160.169.206, datacenter:DC1, rack:RAC1)])
bfihs4-vm25 2012-07-04 13:01:40,756 [Hector.me.prettyprint.cassandra.connection.NodeAutoDiscoverService-1] INFO  cassandra.connection.NodeAutoDiscoverService - Found a node we don't know about 10.160.169.207(10.160.169.207):9160 for TokenRange TokenRange(start_token:102084710076281535261119195933814292480, end_token:136112946768375392941136215564139626496, endpoints:[10.160.169.207, 10.160.169.203], rpc_endpoints:[10.160.166.207, 10.160.166.203], endpoint_details:[EndpointDetails(host:10.160.169.207, datacenter:DC1, rack:RAC1), EndpointDetails(host:10.160.169.203, datacenter:DC1, rack:RAC1)])
bfihs4-vm25 2012-07-04 13:01:40,756 [Hector.me.prettyprint.cassandra.connection.NodeAutoDiscoverService-1] INFO  cassandra.connection.NodeAutoDiscoverService - Found a node we don't know about 10.160.169.203(10.160.169.203):9160 for TokenRange TokenRange(start_token:102084710076281535261119195933814292480, end_token:136112946768375392941136215564139626496, endpoints:[10.160.169.207, 10.160.169.203], rpc_endpoints:[10.160.166.207, 10.160.166.203], endpoint_details:[EndpointDetails(host:10.160.169.207, datacenter:DC1, rack:RAC1), EndpointDetails(host:10.160.169.203, datacenter:DC1, rack:RAC1)])
bfihs4-vm25 2012-07-04 13:01:40,756 [Hector.me.prettyprint.cassandra.connection.NodeAutoDiscoverService-1] INFO  cassandra.connection.NodeAutoDiscoverService - Found a node we don't know about 10.160.169.203(10.160.169.203):9160 for TokenRange TokenRange(start_token:136112946768375392941136215564139626496, end_token:0, endpoints:[10.160.169.203, 10.160.169.204], rpc_endpoints:[10.160.166.203, 10.160.166.204], endpoint_details:[EndpointDetails(host:10.160.169.203, datacenter:DC1, rack:RAC1), EndpointDetails(host:10.160.169.204, datacenter:DC1, rack:RAC1)])
bfihs4-vm25 2012-07-04 13:01:40,756 [Hector.me.prettyprint.cassandra.connection.NodeAutoDiscoverService-1] INFO  cassandra.connection.NodeAutoDiscoverService - Found a node we don't know about 10.160.169.204(10.160.169.204):9160 for TokenRange TokenRange(start_token:136112946768375392941136215564139626496, end_token:0, endpoints:[10.160.169.203, 10.160.169.204], rpc_endpoints:[10.160.166.203, 10.160.166.204], endpoint_details:[EndpointDetails(host:10.160.169.203, datacenter:DC1, rack:RAC1), EndpointDetails(host:10.160.169.204, datacenter:DC1, rack:RAC1)])
bfihs4-vm25 2012-07-04 13:01:40,756 [Hector.me.prettyprint.cassandra.connection.NodeAutoDiscoverService-1] INFO  cassandra.connection.NodeAutoDiscoverService - Found 5 new host(s) in Ring
bfihs4-vm25 2012-07-04 13:01:40,756 [Hector.me.prettyprint.cassandra.connection.NodeAutoDiscoverService-1] INFO  cassandra.connection.NodeAutoDiscoverService - Addding found host 10.160.169.206(10.160.169.206):9160 to pool
bfihs4-vm25 2012-07-04 13:01:40,757 [Hector.me.prettyprint.cassandra.connection.NodeAutoDiscoverService-1] ERROR cassandra.connection.HConnectionManager - Transport exception host to HConnectionManager: 10.160.169.206(10.160.169.206):9160
    at me.prettyprint.cassandra.connection.NodeAutoDiscoverService.doAddNodes(NodeAutoDiscoverService.java:74)
    at me.prettyprint.cassandra.connection.NodeAutoDiscoverService$QueryRing.run(NodeAutoDiscoverService.java:59)
bfihs4-vm25 2012-07-04 13:01:40,757 [Hector.me.prettyprint.cassandra.connection.NodeAutoDiscoverService-1] INFO  cassandra.connection.NodeAutoDiscoverService - Addding found host 10.160.169.207(10.160.169.207):9160 to pool
bfihs4-vm25 2012-07-04 13:01:40,765 [Hector.me.prettyprint.cassandra.connection.NodeAutoDiscoverService-1] ERROR cassandra.connection.HConnectionManager - Transport exception host to HConnectionManager: 10.160.169.207(10.160.169.207):9160
    at me.prettyprint.cassandra.connection.NodeAutoDiscoverService.doAddNodes(NodeAutoDiscoverService.java:74)
    at me.prettyprint.cassandra.connection.NodeAutoDiscoverService$QueryRing.run(NodeAutoDiscoverService.java:59)
bfihs4-vm25 2012-07-04 13:01:40,766 [Hector.me.prettyprint.cassandra.connection.NodeAutoDiscoverService-1] INFO  cassandra.connection.NodeAutoDiscoverService - Addding found host 10.160.169.204(10.160.169.204):9160 to pool
bfihs4-vm25 2012-07-04 13:01:40,766 [Hector.me.prettyprint.cassandra.connection.NodeAutoDiscoverService-1] ERROR cassandra.connection.HConnectionManager - Transport exception host to HConnectionManager: 10.160.169.204(10.160.169.204):9160
    at me.prettyprint.cassandra.connection.NodeAutoDiscoverService.doAddNodes(NodeAutoDiscoverService.java:74)
    at me.prettyprint.cassandra.connection.NodeAutoDiscoverService$QueryRing.run(NodeAutoDiscoverService.java:59)
bfihs4-vm25 2012-07-04 13:01:40,766 [Hector.me.prettyprint.cassandra.connection.NodeAutoDiscoverService-1] INFO  cassandra.connection.NodeAutoDiscoverService - Addding found host 10.160.169.205(10.160.169.205):9160 to pool
bfihs4-vm25 2012-07-04 13:01:40,767 [Hector.me.prettyprint.cassandra.connection.NodeAutoDiscoverService-1] ERROR cassandra.connection.HConnectionManager - Transport exception host to HConnectionManager: 10.160.169.205(10.160.169.205):9160
    at me.prettyprint.cassandra.connection.NodeAutoDiscoverService.doAddNodes(NodeAutoDiscoverService.java:74)
    at me.prettyprint.cassandra.connection.NodeAutoDiscoverService$QueryRing.run(NodeAutoDiscoverService.java:59)
bfihs4-vm25 2012-07-04 13:01:40,767 [Hector.me.prettyprint.cassandra.connection.NodeAutoDiscoverService-1] INFO  cassandra.connection.NodeAutoDiscoverService - Addding found host 10.160.169.203(10.160.169.203):9160 to pool
bfihs4-vm25 2012-07-04 13:01:40,768 [Hector.me.prettyprint.cassandra.connection.NodeAutoDiscoverService-1] ERROR cassandra.connection.HConnectionManager - Transport exception host to HConnectionManager: 10.160.169.203(10.160.169.203):9160
    at me.prettyprint.cassandra.connection.NodeAutoDiscoverService.doAddNodes(NodeAutoDiscoverService.java:74)
    at me.prettyprint.cassandra.connection.NodeAutoDiscoverService$QueryRing.run(NodeAutoDiscoverService.java:59)
bfihs4-vm25 2012-07-04 13:01:40,768 [Hector.me.prettyprint.cassandra.connection.NodeAutoDiscoverService-1] DEBUG cassandra.connection.NodeAutoDiscoverService - Auto discovery service run complete.

We believe the reason is due to the NodeAutoDiscoverService using tokenRange.getEndpoint_details() which returns the listen_address rather than the rpc_address. This can be verified via by the logs above.

As the listen_address is used it is always discovered as it will never be in existingHosts which contains the rpc_address (see code below). However, the subsequent connection attempt will fail as it is the wrong interface.

  public Set<CassandraHost> discoverNodes() {
...
              for (EndpointDetails endPointDetail : tokenRange.getEndpoint_details()) {
                // Check if we are allowed to include this Data Center.
                if (dataCenterValidator.validate(endPointDetail.getDatacenter())) {
                  // Maybe add this host if it's a new host.
                  CassandraHost foundHost = new CassandraHost(endPointDetail.getHost(), cassandraHostConfigurator.getPort());
                  if ( !existingHosts.contains(foundHost) ) {
                    log.info("Found a node we don't know about {} for TokenRange {}", foundHost, tokenRange);
                    foundHosts.add(foundHost);
                  }
                }
              }
...
}

Possible solutions: 1.) Use tokenRange.getRpc_endpoints() instead of tokenRange.getEndpoint_details(). However, it won't be possible to filter out nodes from particular DCs. 2.) An alternative is to use a combination of tokenRange.getRpc_endpoints() and tokenRange.getEndpoint_details() to both filter and obtain the rpc_address. However, this assumes that what is returned by these methods are in the same order and it is not clear that this is the case.

zznate commented 12 years ago

I'm pretty sure that both these methods are assembled server side based on token order as, iirc, it's held in a BiMap regardless of the replication strategy implementation. @pbarron if you have some cycles, that would be the place to look code-wise in Cassandra (AbstractReplicationStrategy and TokenMetadata in o.a.c.locator(?) package - again iirc, but I think that's it). I'll take a look later this evening (gmt-6) if you don't. Thanks again.

pbarron commented 12 years ago

The NodeAutoDiscoverService uses the ThriftCluster describeRing method to obtain to contain a list of TokenRanges. This maps back to o.a.c.t.CassandraServer describe_ring method which makes a call to o.a.c.s.StorageServer describeRing method which generates the TokenRanges (see below)


    public List<TokenRange> describeRing(String keyspace) throws InvalidRequestException
    {
        if (keyspace == null || !Schema.instance.getNonSystemTables().contains(keyspace))
            throw new InvalidRequestException("There is no ring for the keyspace: " + keyspace);

        List<TokenRange> ranges = new ArrayList<TokenRange>();
        Token.TokenFactory tf = getPartitioner().getTokenFactory();

        for (Map.Entry<Range<Token>, List<InetAddress>> entry : getRangeToAddressMap(keyspace).entrySet())
        {
            Range range = entry.getKey();
            List<InetAddress> addresses = entry.getValue();
            List<String> endpoints = new ArrayList<String>(addresses.size());
            List<String> rpc_endpoints = new ArrayList<String>(addresses.size());
            List<EndpointDetails> epDetails = new ArrayList<EndpointDetails>(addresses.size());

            for (InetAddress endpoint : addresses)
            {
                EndpointDetails details = new EndpointDetails();
                details.host = endpoint.getHostAddress();
                details.datacenter = DatabaseDescriptor.getEndpointSnitch().getDatacenter(endpoint);
                details.rack = DatabaseDescriptor.getEndpointSnitch().getRack(endpoint);

                endpoints.add(details.host);
                rpc_endpoints.add(getRpcaddress(endpoint));

                epDetails.add(details);
            }

            TokenRange tr = new TokenRange(tf.toString(range.left.getToken()), tf.toString(range.right.getToken()), endpoints)
                                    .setEndpoint_details(epDetails)
                                    .setRpc_endpoints(rpc_endpoints);

            ranges.add(tr);
        }

        return ranges;
    }

This would seem to indicate that endpoints, rpc_endpoints and EndpointDetails are in order and such we can depend on it to auto discover nodes.

If you agree I'll submit a change, but it will be monday before I get to do it.

zznate commented 12 years ago

@pbarron I agree with your assesment. Thanks again for looking at this.

hector-client / hector

NodeAutoDiscoverService discovering listen_address instead of rpc_address #490