hashicorp / consul

Consul is a distributed, highly available, and data center aware solution to connect and configure applications across dynamic, distributed infrastructure.
https://www.consul.io
Other
28.27k stars 4.42k forks source link

Consul tcp check does not close cleanly there by throwing Transport Connection java.io.EOFException #4457

Open hfarooqui opened 6 years ago

hfarooqui commented 6 years ago

I have defined following tcp check in consul over stomp port (61650).


            {
                "id": "broker_check",
                "name": "check if broker_master is listening on port 61650",
                "tcp": "10.206.24.221:61650",
                "interval": "10s"
            },

However I see following exceptions in the broker logs

2018-07-27 08:18:06,263 | DEBUG | Transport Connection to: tcp://10.206.24.221:58220 failed: java.io.EOFException | org.apache.activemq.broker.TransportConnection.Transport | ActiveMQ Transport: tcp:///10.206.24.221:58220@61650
java.io.EOFException
    at java.io.DataInputStream.readByte(DataInputStream.java:267)[:1.8.0_172]
    at org.apache.activemq.transport.stomp.StompWireFormat.readHeaderLine(StompWireFormat.java:174)[activemq-stomp-5.14.5.jar:5.14.5]
    at org.apache.activemq.transport.stomp.StompWireFormat.readLine(StompWireFormat.java:167)[activemq-stomp-5.14.5.jar:5.14.5]
    at org.apache.activemq.transport.stomp.StompWireFormat.parseAction(StompWireFormat.java:200)[activemq-stomp-5.14.5.jar:5.14.5]
    at org.apache.activemq.transport.stomp.StompWireFormat.unmarshal(StompWireFormat.java:112)[activemq-stomp-5.14.5.jar:5.14.5]
    at org.apache.activemq.transport.tcp.TcpTransport.readCommand(TcpTransport.java:240)[activemq-client-5.14.5.jar:5.14.5]
    at org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:232)[activemq-client-5.14.5.jar:5.14.5]
    at org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:215)[activemq-client-5.14.5.jar:5.14.5]
    at java.lang.Thread.run(Thread.java:748)[:1.8.0_172]
2018-07-27 08:18:06,264 | DEBUG | Unregistering MBean com.mobileiron.activemq:type=Broker,brokerName=broker2_master,connector=clientConnectors,connectorName=stomp,connectionViewType=remoteAddress,connectionName=tcp_//10.206.24.221_58220 | org.apache.activemq.broker.jmx.ManagementContext | ActiveMQ Transport: tcp:///10.206.24.221:58220@61650
2018-07-27 08:18:06,265 | DEBUG | Stopping connection: tcp://10.206.24.221:58220 | org.apache.activemq.broker.TransportConnection | ActiveMQ BrokerService[broker2_master] Task-2
2018-07-27 08:18:06,266 | DEBUG | Stopping transport tcp:///10.206.24.221:58220@61650 | org.apache.activemq.transport.tcp.TcpTransport | ActiveMQ BrokerService[broker2_master] Task-2

Am I missing something here?

mkeeler commented 6 years ago

@hfarooqui The way the TCP health check works is to open a connection to that IP/Port and then immediately close it. It is operating as expected. However the other end is accepting the connection and then doing a read on the socket which returns an EOF (no data and the stream is closed).

Both sides are operating correctly but it has the unfortunate side effect of causing errors in your logs.

~There are potential ways to allow this to work without impacting the application but all have an extremely high barrier to getting them working cross platform (especially as all ways I know that they can be used involve a lot of C and CGo which we avoid for cross-platform compatibility reasons).~

For documentation sake if we ever do want to consider this in the future the basic approach is to use raw sockets/pcap interfaces to manually handle the TCP connection. The network flow would be:

The sending of a RST prior to fully finishing the TCP handshake means that anything Listening/Accepting connections wont ever see Consul connecting. Instead this is all handled by the OS'es network stack.

~Unless there becomes a way to do this sort of thing without CGo I can't see it getting implemented.~

Looks like this might be possible with using the IPConn/DialIP interfaces in the net package. One thing to note would be that it would require running as a privileged user or at least having the CAP_NET_RAW capability on linux.

hfarooqui commented 6 years ago

Thanks Matt.

For now I am using script to do TCP check using STOMP client. With this I do not see any warnings in the log.

On Thu, Aug 9, 2018 at 12:25 PM Matt Keeler notifications@github.com wrote:

@hfarooqui https://github.com/hfarooqui The way the TCP health check works is to open a connection to that IP/Port and then immediately close it. It is operating as expected. However the other end is accepting the connection and then doing a read on the socket which returns an EOF (no data and the stream is closed).

Both sides are operating correctly but it has the unfortunate side effect of causing errors in your logs.

There are potential ways to allow this to work without impacting the application but all have an extremely high barrier to getting them working cross platform (especially as all ways I know that they can be used involve a lot of C and CGo which we avoid for cross-platform compatibility reasons).

For documentation sake if we ever do want to consider this in the future the basic approach is to use raw sockets/pcap interfaces to manually handle the TCP connection. The network flow would be:

  • Consul send TCP SYN to the Service
  • Service sends a SYN + ACK back to Consul (assuming it is bound to the port)
  • Consul receives the SYN + ACK, marks the service as health and then sends a TCP RST.

The sending of a RST prior to fully finishing the TCP handshake means that anything Listening/Accepting connections wont ever see Consul connecting. Instead this is all handled by the OS'es network stack.

Unless there becomes a way to do this sort of thing without CGo I can't see it getting implemented.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/hashicorp/consul/issues/4457#issuecomment-411868953, or mute the thread https://github.com/notifications/unsubscribe-auth/AHZ769hrL_3QzIxubb3SVb53XPXDaJtsks5uPIyXgaJpZM4VjHgw .