Open jkroonza opened 3 months ago
mysql-check is actually a hack on top of tcp-checks. I mean internally it is expressed as a list of tcp-check rules with custom handlers to parse the mysql response (the initial packet handshake and the OK packet). There is no real support of the MySQL protocol at this stage. Handling authentication and command execution is far more complex and I doubt it is doable with the tcp-checks engine. I guess it is easier to write a custom agent for this purpose.
@capflam that makes sense, and I suspect things like sqlproxy does it that way.
You mention custom agent - if this is something haproxy I'm not familiar with it, or you could mean something like sqlproxy?
I'm just suddenly wondering whether "arbitrary tcp-check" plugins may not actually be possible. I know someone that worked here did some IMAP and POP3 work but it ended up putting so much load on the backing database servers that we just decided the cure was worse than the sporadic failure (at the time at least, that may need to be revisited given an interestig host failure we saw earlier this week). If I recall he did this using tcp-check send and expect somehow. That won't check if the remote side correctly negotiates TLS (which was the relevant failure mode where the server failed to negotiate TLS, no we don't offload TLS into haproxy).
Point being, in theory it should be possible for haproxy to perform the base tcp check, and if that connection works, take the file descriptor and hand it off to some "plugin" that can do it's own checks on the file description, and merely returns a boolean to indicate "all good" or not. That way one could "merely" use libmysql to firstly auth and secondly execute a query, and that code can be written in a much simpler way than what I'm seeing in tcpcheck.c
As a suggestion one could do a basic connect check at interval X, and only every Y checks hand off for a full check. Eg, do a very basic 100ms interval can we connect() call, and every 20 of those we do a full check. A full check is always authoritative, and for recovery from down state a full check has to pass. I just have no idea how this would interfere with threading to be honest, and what happens if the full check takes >100ms? So there are a few considerations to be had ... and I don't know enough of the internals of haproxy to really know if this is even remotely viable or not.
Would be amazing if these full check modules can be external to haproxy itself such that others can compile their own modules to be loaded into haproxy at runtime as and when needed. With the obvious caveats.
Anyway, for now we've decided that as a workaround we'll hook our own monitoring tools that's already in place into the firewalls on the DB nodes to simply reject incoming connections that would normally be accepted should mysql not be healthy.
I'm going to leave this open since this may trigger ideas for others on actual implementation, assuming I've expressed myself reasonable well.
@capflam that makes sense, and I suspect things like sqlproxy does it that way.
You mention custom agent - if this is something haproxy I'm not familiar with it, or you could mean something like sqlproxy?
I mean an external component. I don't know how sqlproxy works. But I had in mind a custom http script requested via an http-check or a tcp one requested via an agent-check.
I'm just suddenly wondering whether "arbitrary tcp-check" plugins may not actually be possible. I know someone that worked here did some IMAP and POP3 work but it ended up putting so much load on the backing database servers that we just decided the cure was worse than the sporadic failure (at the time at least, that may need to be revisited given an interestig host failure we saw earlier this week). If I recall he did this using tcp-check send and expect somehow. That won't check if the remote side correctly negotiates TLS (which was the relevant failure mode where the server failed to negotiate TLS, no we don't offload TLS into haproxy).
There is no plugins at this stage. All specific checks (mysql, postgres...) are internally mapped to tcp-check rules with occasionally custom functions to parse server responses. But nothing really generic or configurable. This remains pretty simple. This is why I said it is a hack.
Point being, in theory it should be possible for haproxy to perform the base tcp check, and if that connection works, take the file descriptor and hand it off to some "plugin" that can do it's own checks on the file description, and merely returns a boolean to indicate "all good" or not. That way one could "merely" use libmysql to firstly auth and secondly execute a query, and that code can be written in a much simpler way than what I'm seeing in tcpcheck.c
Such plugin must be seen as an agent dedicated to health-checks. It is pretty easy to do it in go for instance. This is especially well suited for this kind of work.
As a suggestion one could do a basic connect check at interval X, and only every Y checks hand off for a full check. Eg, do a very basic 100ms interval can we connect() call, and every 20 of those we do a full check. A full check is always authoritative, and for recovery from down state a full check has to pass. I just have no idea how this would interfere with threading to be honest, and what happens if the full check takes >100ms? So there are a few considerations to be had ... and I don't know enough of the internals of haproxy to really know if this is even remotely viable or not.
Well, for now, it is not possible to express this kind of scenario with tcp-checks. But it is probably possible to find a tcp-check syntax to add conditions on rules. However the health-checks must still be expressed via tcp-check rules. And handling complex protocols is a mess.
Would be amazing if these full check modules can be external to haproxy itself such that others can compile their own modules to be loaded into haproxy at runtime as and when needed. With the obvious caveats.
I doubt it will ever be implemented. Having external modules plugged into HAProxy would be an amazing source of bugs. Many people may be tempted to develop their own plugins, more or less stable, more or less maintained, instead of contributing to the project. It may be seen as an obstacle to features implementation. But on our side, it is the guarantee of having a stable and maintainable project over time, and from time to time, having contributions. For the health-checks purpose, I guess having dedicated agents is a acceptable.
Anyway, for now we've decided that as a workaround we'll hook our own monitoring tools that's already in place into the firewalls on the DB nodes to simply reject incoming connections that would normally be accepted should mysql not be healthy.
I'm going to leave this open since this may trigger ideas for others on actual implementation, assuming I've expressed myself reasonable well.
Your Feature Request
from what I can see a password for authenticating isn't possible, nor is supplying a check query (eg, select col from db.table where col='val') to enable verifying the node is actually operational.
What are you trying to do?
In the galera environment it's possible that a node accepts connections, even fully passes authentication but then blocks all queries with "1047 WSREP has not yet prepared node for application use". In this case we should not direct mysql connections at the specific node, we're trying to avoid this case.
Checked on master branch, don't see anything specific that might just not be in version 2.9.6 we're currently using in production, but I could just be missing it.
Output of
haproxy -vv