olafz / percona-clustercheck

Script to make a proxy (ie HAProxy) capable of monitoring Percona XtraDB Cluster nodes properly. The clustercheck script is distributed under the BSD license.
BSD 3-Clause "New" or "Revised" License
174 stars 108 forks source link

when requesting via xinetd+curl, "Recv failure" #2

Open ceejayoz opened 11 years ago

ceejayoz commented 11 years ago
$ curl localhost:9200
Percona XtraDB Cluster Node is synced.

curl: (56) Recv failure: Connection reset by peer

This breaks Amazon ELB, as it sees a 200 response of this nature as a failure.

I tweaked the script to add a Content-Length: 0 header, which appears to make Amazon happy, but I'm not entirely clear on the implications of this, or if there's a better way.

olafz commented 11 years ago

Adding Content-Length: 0 is not really a solution, since a client will ignore the content. I modified the script in such a way that it reports the content length correctly (and as such, curl exits gracefully).

bradbakerdx commented 11 years ago

I'm experiencing the same issue myself and my /usr/bin/clustercheck contains

echo -en "Content-Length: 40\r\n" And echo -en "Content-Length: 44\r\n"

Depending on if its a success or failure. In my instance setting echo -en "Content-Length: 0\r\n" did not help.

See more details here: http://serverfault.com/questions/504756/curl-failure-when-receiving-data-from-peer-using-percona-xtradb-cluster-check

I should clarify when I say "the same issue" - I get the same error when I use CURL to hit clustercheck.

Oddly it only happens when I hit cluster check remotely - hitting it locally seems to work.

In my case I'm using hardware load balancers not AWS load balancers.

bradbakerdx commented 11 years ago

Here is a packet capture containing some successes and some failures: https://www.dropbox.com/s/u2b9asn1p5vyh0r/data.pcap

In the case where there is a success there is an HTTP payload but when it fails there isn't an http payload.

image

lucalvr commented 10 years ago

I have exactly the same issue

homeyjd commented 10 years ago

I have exactly the same issue.

bradbakerdx commented 10 years ago

If it helps anyone, here's the solution we ended up using (its not pretty but its been working for us for about a year):

#!/bin/bash
#
# Script to make a proxy (ie HAProxy) capable of monitoring Percona XtraDB Cluster nodes properly
#
# Author: Olaf van Zandwijk 
# Documentation and download: https://github.com/olafz/percona-clustercheck
#
# Based on the original script from Unai Rodriguez
# Modified by Brad Baker 5/7/2013
#
# This cluster check script is provided by the percona packages under
# /usr/bin/clustercheck. I've made a copy of it to /our-custom-location because I had
# to customize it to get it to work reliably  and I don't want YUM overwriting
# our customized version.
#
# For some reason the percona provided version of this script will
# intermittently fail when accessed remotely using curl or our load balancer
# health check. To test this for yourself remotely run the following command
# for i in {1..1000}; do curl http://your-server:9200; sleep 2; date;  done
#
# After extensive debugging one of the Percona devs had me add sleep statements.  
# After doing so the intermittent issue stopped - WHY?! I have no idea. 
# But with those in place it works reliably. 
if [[ $1 == '-h' || $1 == '--help' ]];then
    echo "Usage: $0    "
    exit
fi
MYSQL_USERNAME="${1:-clustercheckuser}"
MYSQL_PASSWORD="${2:-clustercheckpassword!}"
AVAILABLE_WHEN_DONOR=${3:-0}
ERR_FILE="${4:-/dev/null}"
#Timeout exists for instances where mysqld may be hung
TIMEOUT=10
#
# Perform the query to check the wsrep_local_state
#
WSREP_STATUS=`mysql -nNE --connect-timeout=$TIMEOUT --user=${MYSQL_USERNAME} --password=${MYSQL_PASSWORD} \
-e "SHOW STATUS LIKE 'wsrep_local_state';" 2>${ERR_FILE} | tail -1 2>>${ERR_FILE}`
if [[ "${WSREP_STATUS}" == "4" ]] || [[ "${WSREP_STATUS}" == "2" && ${AVAILABLE_WHEN_DONOR} == 1 ]]
then
    # Percona XtraDB Cluster node local state is 'Synced' => return HTTP 200
    # Shell return-code is 0
    echo -en "HTTP/1.1 200 OK\r\n"
    sleep 0.1
    echo -en "Content-Type: text/plain\r\n"
    sleep 0.1
    echo -en "Connection: close\r\n"
    sleep 0.1
    echo -en "Content-Length: 40\r\n"
    sleep 0.1
    echo -en "\r\n"
    sleep 0.1
    echo -en "Percona XtraDB Cluster Node is synced.\r\n"
    sleep 0.1
    exit 0
else
    # Percona XtraDB Cluster node local state is not 'Synced' => return HTTP 503
    # Shell return-code is 1
    echo -en "HTTP/1.1 503 Service Unavailable\r\n"
    sleep 0.1
    echo -en "Content-Type: text/plain\r\n"
    sleep 0.1
    echo -en "Connection: close\r\n"
    sleep 0.1
    echo -en "Content-Length: 44\r\n"
    sleep 0.1
    echo -en "\r\n"
    sleep 0.1
    echo -en "Percona XtraDB Cluster Node is not synced.\r\n"
    exit 1
fi
leoleovich commented 8 years ago

Hello my dear friends. Today I ran into the same problem. I spent some time to figure out what costs this issue, so let me explain why it fails (sleeps do not really help):

1) When curl/browser/keepalived... any proper client is asking for GET / HTTP/1.1 it actually expects you to respect http protocol. This requires, actually, read headers and body from the client. In realization you implemented do not read anything from client. You just send reply to him. This magically works for haproxy only because haproxy also completely ignores http protocol and also sends only GET / HTTP/1.0 without headers. Or with some configuration, send header, but they are always shorter than reply from sh script. This gives you a chance that generation of a reply will take a bit longer than sending this one line.

So why sleeps did not help for every client?

2) Another "good" thing is - RST flag. After you do exit 0 xinetd immediately resets connection without proper finishing it. This makes no problem for browser or curl, but makes completely crazy C++ bufferevent_socket_connect for example, which expects to properly close connection.

Anyway, the solution is very easy - eather you properly read http headers from stdin, wait for \r\n and only then send the result with real Content-Length, or you stop using retarded xinetd (if you open the manual of xinetd it says REUSE flag is depricated) and use http server + mysql connector which you can easily write in any language within 2 hours. I did this - https://github.com/innogames/galeraht

I hope it helps to people like I, who experienced the same problem.

fspv commented 8 years ago

Got to this ticket from google. Here is one more solution. We are looking for \r in input and only after it returning responce.

#!/bin/bash

while read line
do
  test "$line" = $'\r' && break
done

/bin/echo "HTTP/1.1 200 OK"
/bin/echo "Content-Type: text/plain"
/bin/echo "Connection: close"
/bin/echo "Content-Length: 3"
/bin/echo ""
/bin/echo "OK"
dgeo commented 2 years ago

just use https://github.com/olafz/percona-clustercheck/pull/18