vromero / activemq-artemis-helm

Helm chart for a cluster of ActiveMQ Artemis (Work in progress)
42 stars 68 forks source link

Slave failing to connect #16

Closed tiagokos closed 6 years ago

tiagokos commented 6 years ago

Hello, I am trying to install artemis chart in a private kubernetes cluster but slave node is failing Readiness probe failed: dial tcp 10.244.21.14:61616: connect: connection refused

I do not see anything wrong, pod is running but not ready 0/1 User removed. User added successfully. Merging input with '/var/lib/artemis/etc-override/broker-10.xml' Merging input with '/var/lib/artemis/etc-override/broker-11.xml' Calculating performance journal ... 100000


/ \  ____| |_  ___ __  __(_) _____

/ | \ |/ _ \ \/ | |/ _/ / \ | \/ |/ / |\/| | |\ \ // _| ___|| |||/ / Apache ActiveMQ Artemis 2.6.2 2018-09-18 11:40:39,548 INFO [org.apache.activemq.artemis.integration.bootstrap] AMQ101000: Starting ActiveMQ Artemis Server 2018-09-18 11:40:39,674 INFO [org.apache.activemq.artemis.core.server] AMQ221000: backup Message Broker is starting with configuration Broker Configuration (clustered=true,journalDirectory=data/journal,bindingsDirectory=data/bindings,largeMessagesDirectory=data/large-messages,pagingDirectory=data/paging) 2018-09-18 11:40:39,692 INFO [org.apache.activemq.artemis.core.server] AMQ222162: Moving data directory /var/lib/artemis/data/journal to /var/lib/artemis/data/journal/oldreplica.1 2018-09-18 11:40:39,733 INFO [org.apache.activemq.artemis.core.server] AMQ221012: Using AIO Journal 2018-09-18 11:40:39,927 INFO [org.apache.activemq.artemis.core.server] AMQ221057: Global Max Size is being adjusted to 1/2 of the JVM max size (-Xmx). being defined as 4,202,692,608 2018-09-18 11:40:40,080 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-server]. Adding protocol support for: CORE 2018-09-18 11:40:40,080 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-amqp-protocol]. Adding protocol support for: AMQP 2018-09-18 11:40:40,081 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-hornetq-protocol]. Adding protocol support for: HORNETQ 2018-09-18 11:40:40,081 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-mqtt-protocol]. Adding protocol support for: MQTT 2018-09-18 11:40:40,082 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-openwire-protocol]. Adding protocol support for: OPENWIRE 2018-09-18 11:40:40,090 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-stomp-protocol]. Adding protocol support for: STOMP 2018-09-18 11:40:40,345 INFO [org.apache.activemq.hawtio.branding.PluginContextListener] Initialized activemq-branding plugin 2018-09-18 11:40:40,446 INFO [org.apache.activemq.hawtio.plugin.PluginContextListener] Initialized artemis-plugin plugin 2018-09-18 11:40:40,570 INFO [org.apache.activemq.artemis.core.server] AMQ221109: Apache ActiveMQ Artemis Backup Server version 2.6.2 [null] started, waiting live to fail before it gets active 018-09-18 11:40:41,006 INFO [io.hawt.HawtioContextListener] Initialising hawtio services 2018-09-18 11:40:41,033 INFO [io.hawt.system.ConfigManager] Configuration will be discovered via system properties 2018-09-18 11:40:41,036 INFO [io.hawt.jmx.JmxTreeWatcher] Welcome to hawtio 1.5.5 : http://hawt.io/ : Don't cha wish your console was hawt like me? ;-) 2018-09-18 11:40:41,039 INFO [io.hawt.jmx.UploadManager] Using file upload directory: /var/lib/artemis/tmp/uploads 2018-09-18 11:40:41,064 INFO [io.hawt.web.AuthenticationFilter] Starting hawtio authentication filter, JAAS realm: "activemq" authorized role(s): "amq" role principal classes: "org.apache.activemq.artemis.spi.core.security.jaas.RolePrincipal" 2018-09-18 11:40:41,113 INFO [io.hawt.web.JolokiaConfiguredAgentServlet] Jolokia overridden property: [key=policyLocation, value=file:/var/lib/artemis/etc/jolokia-access.xml] 2018-09-18 11:40:41,152 INFO [io.hawt.web.RBACMBeanInvoker] Using MBean [hawtio:type=security,area=jmx,rank=0,name=HawtioDummyJMXSecurity] for role based access control 2018-09-18 11:40:41,361 INFO [io.hawt.system.ProxyWhitelist] Initial proxy whitelist: [localhost, 127.0.0.1, 10.244.21.14, activemq-artemis-activemq-artemis-slave-0.activemq-artemis-activemq-artemis-slave.kube-system.svc.cluster.local] 2018-09-18 11:40:41,727 INFO [org.apache.activemq.artemis] AMQ241001: HTTP Server started at http://0.0.0.0:8161 2018-09-18 11:40:41,727 INFO [org.apache.activemq.artemis] AMQ241002: Artemis Jolokia REST API available at http://0.0.0.0:8161/console/jolokia 2018-09-18 11:40:41,727 INFO [org.apache.activemq.artemis] AMQ241004: Artemis Console available at http://0.0.0.0:8161/console 2018-09-18 11:40:46,586 INFO [org.apache.activemq.artemis.core.server] AMQ221024: Backup server ActiveMQServerImpl::serverUUID=cb9170da-bb34-11e8-a42b-0a580af4140d is synchronized with live-server. 2018-09-18 11:40:46,609 INFO [org.apache.activemq.artemis.core.server] AMQ221031: backup announced

tiagokos commented 6 years ago

Odd is that slave works but second node is never started when replication factor = 2

DanSalt commented 6 years ago

From what I understand, most of the above is expected behaviour. The slave runs in 'passive' mode, meaning that it won't start it's connector (on port 61616) until it becomes 'active' (when the master fails). So the readiness probe (which is inspecting the live connection) will fail (which is ok, as you don't want clients trying to connect to it via the LoadBalancer).

However, we had to add the following line to the spec in the slave StatefulSet to get all slaves to deploy when using more than one replica:

podManagementPolicy: "Parallel"

This allows the deployment of Slave pods to continue even if one isn't ready. We made fixes to this and a few other things in our local version of the chart. I'm working on creating a branch/PR for @vromero to review.

Hope this helps, Cheers, Dan

vromero commented 6 years ago

@DanSalt is completely spot on. I never figured out the balance between a parallel policy (that adds risk on cluster reduction) or serial that has problems with the replicas. I'm looking forward to see Dan's PR.