rabbitmq / rabbitmq-management

RabbitMQ Management UI and HTTP API
https://www.rabbitmq.com/management.html
Other
370 stars 165 forks source link

Management API : Cluster links missing node #559

Closed BjoernT closed 6 years ago

BjoernT commented 6 years ago

When we query the management API for cluster link (json below), I noticed inconsistencies between CLI and API. At this point I'm not sure which tool is correct giving the inconsistency of information below. Some help would be greatly appreciated.

# rabbitmqctl cluster_status
Cluster status of node 'rabbit@infra02-rabbit-mq-container-1f518ef3' ...
[{nodes,[{disc,['rabbit@infra01-rabbit-mq-container-24ebfd26',
                'rabbit@infra02-rabbit-mq-container-1f518ef3',
                'rabbit@infra03-rabbit-mq-container-6968817b']}]},
 {running_nodes,['rabbit@infra01-rabbit-mq-container-24ebfd26',
                 'rabbit@infra03-rabbit-mq-container-6968817b',
                 'rabbit@infra02-rabbit-mq-container-1f518ef3']},
 {cluster_name,<<"openstack">>},
 {partitions,[]},
 {alarms,[{'rabbit@infra01-rabbit-mq-container-24ebfd26',[]},
          {'rabbit@infra03-rabbit-mq-container-6968817b',[]},
          {'rabbit@infra02-rabbit-mq-container-1f518ef3',[]}]}]

But the api reports these cluster links which seem to indicate a missing host :

"cluster_links": [
    {
        "name": "rabbitmq-cli-01@infra02-rabbit-mq-container-1f518ef3",
        "peer_addr": "172.29.239.17",
        "peer_port": 37354,
        "sock_addr": "172.29.239.14",
        "sock_port": 25672,
        "stats": {
            "send_bytes": 591,
            "send_bytes_details": {
                "rate": 0.0
            }
        }
    },
    {
        "name": "rabbit@infra02-rabbit-mq-container-1f518ef3",
        "peer_addr": "172.29.239.17",
        "peer_port": 37150,
        "sock_addr": "172.29.239.14",
        "sock_port": 25672,
        "stats": {
            "send_bytes": 7043628731,
            "send_bytes_details": {
                "rate": 215355.2
            }
        }
    },
    {
        "name": "rabbit@infra01-rabbit-mq-container-24ebfd26",
        "peer_addr": "172.29.239.224",
        "peer_port": 25672,
        "sock_addr": "172.29.239.14",
        "sock_port": 50198,
        "stats": {
            "send_bytes": 20546714762,
            "send_bytes_details": {
                "rate": 1626052.6
            }
        }
    }
],

But from what I can see, the cluster links seem to be established:

root@infra02-rabbit-mq-container-1f518ef3:/# netstat -t |grep 25672 |grep EST
tcp        0     64 infra02-ra:37150 infra03-ra:25672 ESTABLISHED
tcp        0      0 infra02-ra:25672 infra03-ra:46720 ESTABLISHED
tcp        0      0 infra02-ra:44443 infra01-ra:25672 ESTABLISHED
michaelklishin commented 6 years ago

rabbitmq-cli-27 is a CLI tool (which in some ways is a cluster peer, a transient one). infra01 isn't listed in the HTTP API response because you are contacting it. So they are both correct (or incorrect) in their own way :)

BjoernT commented 6 years ago

@michaelklishin Thanks but why is there no infra01 cluster link listed via the API ?

michaelklishin commented 6 years ago

"cluster links" is not really the same as "cluster members". A node isn't linked with itself but it is a cluster member. netstat simply lists TCP connections which includes the local node. HTTP API doesn't. I'm not saying it should or shouldn't but the different with netstat output is not random or a bug.

michaelklishin commented 6 years ago

Also note that you haven't mentioned what API endpoint you are comparing to but there is GET /api/nodes which will list every node there is (and is what management UI uses to list cluster nodes).

BjoernT commented 6 years ago

Oh sorry we used /api/nodes so if I understand it correctly cluster links are not suitable for node monitoring. We were under the impression that we have a cluster link for every remotely joined node. I understand that local nodes won't show up but in this case the listing was done on infra02 and I expected to get infr01 and 03 listed as link, hence this is where the confusion came from

michaelklishin commented 6 years ago

Most content in GET /api/overview is not node-specific but apparently the links are. GET /api/nodes should work the way you expect (in addition to GET /api/overview, not as a replacement).

seancarlisle commented 6 years ago

Hi @michaelklishin I have a question about the transient cluster peer rabbitmq-cli-27 you mentioned earlier. We've seen instances where this peer (or one similarly named) won't go away. In fact, it seems to stay until the cluster member who has it as a cluster link gets restarted. Any thoughts on the cause there, or how to remove it without having to restart cluster members?

BjoernT commented 6 years ago

I just looked at the code and we used /api/nodes so I have currently the links

rabbitmq-cli-01@infra02-rabbit-mq-container-1f518ef3
rabbit@infra02-rabbit-mq-container-1f518ef3
rabbit@infra01-rabbit-mq-container-24ebfd26

but infra03 is missing. I get the local transient link, which seems to be a stuck cli command somewhere. I updated the description to reflect the current state of infra02

BjoernT commented 6 years ago

@michaelklishin I updated the ticket description. I still believe the link is missing when you read it. Is there a different way to evaluate the links ?

michaelklishin commented 6 years ago

We use rabbitmq-users for discussions and questions. Besides GET /api/overview, GET /api/nodes and GET /api/nodes/{node}, I don't think any API endpoints include cluster links. More interesting metrics for monitoring are usually nodes and partitions. You can inspect the links on individual node pages in the management UI to compare. Besides ingress and egress traffic rates, they are not very interesting.

BjoernT commented 6 years ago

That output above is from GET /api/nodes which does report cluster link. I'm fine to address rabbitmq-users but I think there is an issue with 3.6.x and displaying cluster links compared to GET /api/nodes/<node>. Because when I checked GET /api/nodes/rabbit@infra02-rabbit-mq-container-1f518ef3, this URL correctly displayed the remote nodes so I assume there is some inconsistency in between both views or something I don't understand

michaelklishin commented 6 years ago

We need steps to reproduce against a supported version (3.7.3 or at least 3.6.15). 3.6.6 uses legacy stats collection architecture and is not getting any more updates.

michaelklishin commented 6 years ago

We have a couple of long running environments and one of them has 3.6.16-alpha.5 and 3 nodes. I have inspected the links and they make sense: node 0 is connected to 1 and 2, 1 to 0 and 2, 2 to 0 and 1. I can provide access to that environment off-list.

BjoernT commented 6 years ago

So you don't have any difference between cluster links reported at '/api/nodes' and '/api/nodes/hostname' ?

michaelklishin commented 6 years ago

Here are some results collected with curl.

This is mailing list material.