Closed taijitao closed 4 years ago
does this plugin support ipv6 only stack or it support ipv6/ipv4 stack?
This plugin issues requests to the Kubernetes API over HTTP[S]. It is entirely unaware of what IP version is used underneath. nxdomain
, as I'm sure you know, means "no domain resolved". This plugin cannot be responsible for that.
For cases when proper hostname resolution configuration is not available, Erlang provides its own resolution configuration file which should be pointed at using the ERL_INETRC
environment variable. You don't need it most of the time but sometimes it is indispensable.
Versions of the software from this rabbitmq-users
discussion:
rabbitmq_3.7.18-1.el7
erlang_22.0.7-1.el7
I suspect this is due to the httpc
library defaulting to inet
: docs.
Note the default value for IpFamily
.
@taijitao since you have access to an IPv6-only environment, I will create a custom build of this plugin for you to test.
@taijitao - here is the custom plugin built from this branch:
rabbitmq_peer_discovery_k8s-3.7.20+rc.1.dirty.ez.zip
To install:
.zip
extension.rabbitmq_peer_discovery_k8s-3.7.18.ez
file and re-name it or move it out of the way.rabbitmq_peer_discovery_k8s-3.7.20+rc.1.dirty.ez
to that location.Please note that cluster formation only happens the first time RabbitMQ is started. If these nodes have been started before, you will have to reset them (rabbitmqctl reset
) or delete their data directory.
@taijitao any chance to test this? ^^^^
Yes, I'll test that. Could you give me some explaination what you have changed in the customize build?
@taijitao it configures (unconditionally at the moment) HTTP client's socket address family to IPv6.
I have tested it and it worked. the erlang setting is : {inet6, true}. good news is :
2019-10-22 06:10:28.934 [info] <0.274.0> Peer discovery Kubernetes: setting IpFamily to inet6...
2019-10-22 06:10:28.934 [info] <0.274.0> Peer discovery Kubernetes: setting IpFamily to inet6 response: ok
2019-10-22 06:10:28.934 [info] <0.274.0> Will try to lock with peer discovery backend rabbit_peer_discovery_k8s
2019-10-22 06:10:28.934 [info] <0.274.0> Peer discovery backend does not support locking, falling back to randomized delay
2019-10-22 06:10:28.934 [info] <0.274.0> Peer discovery backend rabbit_peer_discovery_k8s does not support registration, skipping randomized startup delay.
2019-10-22 06:10:29.016 [info] <0.274.0> All discovered existing cluster peers: rabbit@zt2-crmq-1, rabbit@zt2-crmq-0
2019-10-22 06:10:29.016 [info] <0.274.0> Peer nodes we can cluster with: rabbit@zt2-crmq-0
2019-10-22 06:10:29.032 [warning] <0.274.0> Could not auto-cluster with node rabbit@zt2-crmq-0: {badrpc,nodedown}
but it's fail to form cluser. I now had two separated nodes. docker process bash-4.2$ ps -ef
UID PID PPID C STIME TTY TIME CMD
rabbitmq 1 0 0 06:09 ? 00:00:00 /bin/sh /usr/lib/rabbitmq/bin/rabbitmq-server start
rabbitmq 197 1 0 06:09 ? 00:00:00 /usr/lib64/erlang/erts-10.4.4/bin/epmd -daemon
rabbitmq 383 1 1 06:09 ? 00:00:18 /usr/lib64/erlang/erts-10.4.4/bin/beam.smp -W w -A 64 -MBas ageffcbf -MHas ageffcbf -MBlmbcs 512 -MHlmbcs 512 -MMmcs 30 -P 1048
rabbitmq 551 383 0 06:10 ? 00:00:00 erl_child_setup 1048576
rabbitmq 1894 551 0 06:10 ? 00:00:00 inet_gethost 4
rabbitmq 1895 1894 0 06:10 ? 00:00:00 inet_gethost 4
rabbitmq 9563 0 35 06:26 ? 00:00:00 /usr/lib64/erlang/erts-10.4.4/bin/beam.smp -B -- -root /usr/lib64/erlang -progname erl -- -home /var/lib/rabbitmq -- -boot star
rabbitmq 9676 9563 34 06:26 ? 00:00:00 erl_child_setup 1048576
rabbitmq 9697 0 2 06:26 ? 00:00:00 bash
rabbitmq 9706 9697 0 06:26 ? 00:00:00 ps -ef
According to the log discovery via Kubernetes API endpoint has succeeded. However, nodes could not contact and/or authenticate with each other. This is not a responsibility of this plugin. See rabbit@zt2-crmq-0
logs for more clues. This part of the discussion is mailing list material.
httpc
can only use one address family for its sockets. So we have a couple of options:
inet6
(for IPv6)I personally would prefer the latter. @taijitao WDYT?
Hi, I had a k8s configured in pure IPv6 ( with Kind ).
I tried this patch because I'd need also here. It seems to work correctly:
[vagrant@localhost k8s_statefulsets]$ kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
rabbitmq-0 1/1 Running 0 9m59s fd00:10:244::27 kind-control-plane <none> <none>
rabbitmq-1 1/1 Running 0 8m43s fd00:10:244::28 kind-control-plane <none> <none>
rabbitmq-2 1/1 Running 0 7m51s fd00:10:244::29 kind-control-plane <none> <none>
and:
kubectl describe service rabbitmq
Name: rabbitmq
Namespace: default
Labels: app=rabbitmq
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"app":"rabbitmq"},"name":"rabbitmq","namespace":"default"},"spe...
Selector: app=rabbitmq
Type: NodePort
IP: fd00:10:96::99a8
Port: http 15672/TCP
TargetPort: 15672/TCP
NodePort: http 31672/TCP
Endpoints: [fd00:10:244::27]:15672,[fd00:10:244::28]:15672,[fd00:10:244::29]:15672
Port: amqp 5672/TCP
TargetPort: 5672/TCP
NodePort: amqp 30672/TCP
Endpoints: [fd00:10:244::27]:5672,[fd00:10:244::28]:5672,[fd00:10:244::29]:5672
Session Affinity: None
External Traffic Policy: Cluster
Events: <none>
also the cluster status:
rabbitmqctl cluster_status
Cluster status of node rabbit@rabbitmq-0.rabbitmq.default.svc.cluster.local ...
Basics
Cluster name: rabbit@rabbitmq-0.rabbitmq.default.svc.cluster.local
Disk Nodes
rabbit@rabbitmq-0.rabbitmq.default.svc.cluster.local
rabbit@rabbitmq-1.rabbitmq.default.svc.cluster.local
rabbit@rabbitmq-2.rabbitmq.default.svc.cluster.local
Running Nodes
rabbit@rabbitmq-0.rabbitmq.default.svc.cluster.local
rabbit@rabbitmq-1.rabbitmq.default.svc.cluster.local
rabbit@rabbitmq-2.rabbitmq.default.svc.cluster.local
I noticed that for some reason the command check_port_connectivity
does not work correctly in this stack:
rabbitmq-diagnostics check_port_connectivity
Testing TCP connections to all active listeners on node rabbit@rabbitmq-0.rabbitmq.default.svc.cluster.local ...
Error:
Connection to ports of the following listeners on node rabbit@rabbitmq-0.rabbitmq.default.svc.cluster.local failed:
Interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Interface: [::], port: 15672, protocol: http, purpose: HTTP API
@michaelklishin working on a PR to fix this in an "auto detect" fashion
thanks lukebakken for your help. it's better to 'auto detect' than to switch between different binary plugin. Now cluster is created based on your private build.
Auto-detection has a tendency to fail in ways that are hard to understand. There will be no switching between binary plugins if we can't get auto-detection to work reliably but an option that lets the operator to tell the plugin what AF to use.
that's fine if one option is provided. is it in the erl_inetrc? or in plugin configuration?
@taijitao @Gsantomaggio if you have time, I would really appreciate you testing the fix in https://github.com/rabbitmq/rabbitmq-peer-discovery-common/pull/11
rabbitmq_peer_discovery_k8s-3.7.18.ez
file to the original.rabbitmq_peer_discovery_common*.ez
file, and move it or rename it..zip
extension:rabbitmq_peer_discovery_common-3.7.20+rc.1.2.gb768f10.ez.zip
{inet6, true}
in your ERL_INETRC
file.The changes in https://github.com/rabbitmq/rabbitmq-peer-discovery-common/pull/11 look for the presence of {inet6, true}
in your inetrc
file and will set the appropriate httpc
option if found.
@taijitao @lukebakken Could you help take a look at my issue, thanks a lot! I have tried as you mentioned above and other methods. The rabbitmq pod always failed with below error in my IPV6 setup. ERROR: epmd error for host osh-openstack-rabbitmq-rabbitmq-0.rabbitmq.openstack.svc.cluster.local: nxdomain (non-existing domain)
1) I added below in configmap-etc.yaml environment: |- RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS="+A 128 -kernel inetrc '/etc/rabbitmq/erl_inetrc' -proto_dist inet6_tcp" RABBITMQ_CTL_ERL_ARGS="-proto_dist inet6_tcp" erl_inetrc: |- {inet6, true}. 2) In my armada manifest, pull image: rabbitmq: docker.io/rabbitmq:3.7.24
Thanks! Zhipeng
@lukebakken Do I need your patch? Has your patch been merged into some release(3.7.24 or later ) Thanks!!! Zhipeng
@hustlzp1981 see the milestone on this PR and 3.7.20
release notes?
@hustlzp1981 this is not a support forum. Please post your questions to the mailing list.
nxdomain
means that the hostname (osh-openstack-rabbitmq-rabbitmq-0.rabbitmq.openstack.svc.cluster.local
) failed to resolve. This PR simply makes the HTTP client use IPv6 if it is configured via ERL_INETRC
. There must be an AAAA DNS record in place or the client won't be able to resolve it.
Thanks klishin! Could you tell me which mailing list I should use?
RabbitMQ has only one and it hasn't changed since 2014.
The nxdomain
is a common problem in k8s, maybe we should update the documentation to add this document , this document, and add some specific example for rabbitmq.
Thanks! Now I fixed nxdomain issue in my ipv6 k8s setup according to above guide. osh-openstack-rabbitmq-cluster-wait-9rw6p 1/1 Running 0 17m osh-openstack-rabbitmq-rabbitmq-0 1/1 Running 0 17m
However, still have another issue. In pod osh-openstack-rabbitmq-cluster-wait, it will use rabbitmqadmin to connect rabbitmq but always get error. It can work in my ipv4 setup. ++ active_rabbit_nodes 2020-03-17T10:31:12.124589385Z stderr F ++ wc -w 2020-03-17T10:31:12.134367271Z stderr F ++ rabbitmqadmin_authed list nodes -f bash 2020-03-17T10:31:12.134427089Z stderr F ++ set +x 2020-03-17T10:31:12.179073378Z stderr F Traceback (most recent call last): 2020-03-17T10:31:12.179644557Z stderr F error: [Errno 111] Connection refused 2020-03-17T10:31:12.17964969Z stderr F *** Could not connect: [Errno 111] Connection refused
Could not connect: [Errno 111] Connection refused
is specific enough: a TCP connection (presumably to the HTTP API endpoint) was refused.
This is not a Kubernetes support forum so I will lock this.
Hi, I had a pure ipv6 k8s cluster. and i want to instal rabbitmq helm chart. I followed the instrument in https://www.rabbitmq.com/networking.html#distribution-ipv6 My parameter(in helm chart):
File erl_inetrc was created under /etc/rabbitmq. and I found error in log:
the inet could return ipv6 address.
nslookup return ipv6 address when type=aaaa. return error when type=a.
I don't know why httpc:request will return nxdomain. is it a bug or setting issue?
B.R, Tao