Open t0mcat1337 opened 1 year ago
Hi @t0mcat1337 , your regex is not matching the response from SMTP server
line="220 mailserver.dnsname.com dnsname.com SMTP Server"
will not be matched by
^220 ([^ ]+) SMTP (.+)$
Please try
^220 ([^ ]+) ([^ ]+) SMTP (.+)$
Hi @tr3mor,
you're right, the regex indeed was wrong, just tried your one. But the result is the same.
Config:
smtp:
prober: tcp
timeout: 5s
tcp:
preferred_ip_protocol: "ip4" # defaults to "ip6"
ip_protocol_fallback: false # no fallback to "ip6"
query_response:
- expect: "^220 ([^ ]+) ([^ ]+) SMTP (.+)$"
- send: "QUIT\r"
Curl Debug Result:
# curl http://localhost:9115/probe?target=mailserver.dnsname.com:25\&module=smtp\&debug=true
Logs for the probe:
ts=2023-01-01T10:59:13.149825659Z caller=main.go:181 module=smtp target=mailserver.dnsname.com:25 level=info msg="Beginning probe" probe=tcp timeout_seconds=5
ts=2023-01-01T10:59:13.149934741Z caller=tcp.go:40 module=smtp target=mailserver.dnsname.com:25 level=info msg="Resolving target address" target=mailserver.dnsname.com ip_protocol=ip4
ts=2023-01-01T10:59:13.150997485Z caller=tcp.go:40 module=smtp target=mailserver.dnsname.com:25 level=info msg="Resolved target address" target=mailserver.dnsname.com ip=<IP>
ts=2023-01-01T10:59:13.151036804Z caller=tcp.go:112 module=smtp target=mailserver.dnsname.com:25 level=info msg="Dialing TCP without TLS"
ts=2023-01-01T10:59:13.153068518Z caller=handler.go:117 module=smtp target=mailserver.dnsname.com:25 level=info msg="Successfully dialed"
ts=2023-01-01T10:59:13.153096002Z caller=handler.go:117 module=smtp target=mailserver.dnsname.com:25 level=info msg="Processing query response entry" entry_number=0
ts=2023-01-01T10:59:18.150072034Z caller=handler.go:117 module=smtp target=mailserver.dnsname.com:25 level=error msg="Error reading from connection" err="read tcp4 172.40.11.4:56042-><IP>:25: i/o timeout"
ts=2023-01-01T10:59:18.150326858Z caller=main.go:181 module=smtp target=mailserver.dnsname.com:25 level=error msg="Probe failed" duration_seconds=5.000459584
Metrics that would have been returned:
# HELP probe_dns_lookup_time_seconds Returns the time taken for probe dns lookup in seconds
# TYPE probe_dns_lookup_time_seconds gauge
probe_dns_lookup_time_seconds 0.00108621
# HELP probe_duration_seconds Returns how long the probe took to complete in seconds
# TYPE probe_duration_seconds gauge
probe_duration_seconds 5.000459584
# HELP probe_failed_due_to_regex Indicates if probe failed due to regex
# TYPE probe_failed_due_to_regex gauge
probe_failed_due_to_regex 0
# HELP probe_ip_addr_hash Specifies the hash of IP address. It's useful to detect if the IP address changes.
# TYPE probe_ip_addr_hash gauge
probe_ip_addr_hash 2.175365531e+09
# HELP probe_ip_protocol Specifies whether probe ip protocol is IP4 or IP6
# TYPE probe_ip_protocol gauge
probe_ip_protocol 4
# HELP probe_success Displays whether or not the probe was a success
# TYPE probe_success gauge
probe_success 0
Module configuration:
prober: tcp
timeout: 5s
http:
ip_protocol_fallback: true
follow_redirects: true
enable_http2: true
tcp:
preferred_ip_protocol: ip4
query_response:
- expect: ^220 ([^ ]+) ([^ ]+) SMTP (.+)$
- send: "QUIT\r"
icmp:
ip_protocol_fallback: true
ttl: 64
dns:
ip_protocol_fallback: true
recursion_desired: true
IMHO the message
msg="Error reading from connection" err="read tcp4 172.40.11.4:56042-><IP>:25: i/o timeout"
doesn't aim to a wrong regex.
And also notice this:
msg="Probe failed" duration_seconds=5.000459584
The probe exactly fails after the configured timeout of 5s, for me another hint, that this behaviour has nothing to do with the wrong regex.
And as a plus look at this metric in both tries:
probe_failed_due_to_regex 0
doesn't aim to a wrong regex.
What should happen:
1) blackbox exporter connect to smtp server
2) get a message from it
3) match the message with expect
regexp
4) if matched, send send
5) Since it is "QUIT\r" connection closed, no timeout, probe passed
I don't have SMTP, so here is google smtp
Logs for the probe:
ts=2023-01-05T18:31:21.424660262Z caller=main.go:352 module=smtp target=smtp.gmail.com:587 level=info msg="Beginning probe" probe=tcp timeout_seconds=5
ts=2023-01-05T18:31:21.424924137Z caller=tcp.go:40 module=smtp target=smtp.gmail.com:587 level=info msg="Resolving target address" target=smtp.gmail.com ip_protocol=ip4
ts=2023-01-05T18:31:21.426935512Z caller=tcp.go:40 module=smtp target=smtp.gmail.com:587 level=info msg="Resolved target address" target=smtp.gmail.com ip=142.250.27.108
ts=2023-01-05T18:31:21.42695497Z caller=tcp.go:121 module=smtp target=smtp.gmail.com:587 level=info msg="Dialing TCP without TLS"
ts=2023-01-05T18:31:21.473299887Z caller=main.go:144 module=smtp target=smtp.gmail.com:587 level=info msg="Successfully dialed"
ts=2023-01-05T18:31:21.473355053Z caller=main.go:144 module=smtp target=smtp.gmail.com:587 level=info msg="Processing query response entry" entry_number=0
ts=2023-01-05T18:31:21.489195012Z caller=main.go:144 module=smtp target=smtp.gmail.com:587 level=debug msg="Read line" line="220 smtp.gmail.com ESMTP l3-20020aa7c303000000b0048a31c1743asm8308139edq.25 - gsmtp"
ts=2023-01-05T18:31:21.489331262Z caller=main.go:144 module=smtp target=smtp.gmail.com:587 level=info msg="Regexp matched" regexp="^220 .*" line="220 smtp.gmail.com ESMTP l3-20020aa7c303000000b0048a31c1743asm8308139edq.25 - gsmtp"
ts=2023-01-05T18:31:21.489350178Z caller=main.go:144 module=smtp target=smtp.gmail.com:587 level=info msg="Processing query response entry" entry_number=1
ts=2023-01-05T18:31:21.489362095Z caller=main.go:144 module=smtp target=smtp.gmail.com:587 level=debug msg="Sending line" line="QUIT\r"
ts=2023-01-05T18:31:21.489491345Z caller=main.go:352 module=smtp target=smtp.gmail.com:587 level=info msg="Probe succeeded" duration_seconds=0.064642875
Metrics that would have been returned:
# HELP probe_dns_lookup_time_seconds Returns the time taken for probe dns lookup in seconds
# TYPE probe_dns_lookup_time_seconds gauge
probe_dns_lookup_time_seconds 0.002012208
# HELP probe_duration_seconds Returns how long the probe took to complete in seconds
# TYPE probe_duration_seconds gauge
probe_duration_seconds 0.064642875
# HELP probe_failed_due_to_regex Indicates if probe failed due to regex
# TYPE probe_failed_due_to_regex gauge
probe_failed_due_to_regex 0
# HELP probe_ip_addr_hash Specifies the hash of IP address. It's useful to detect if the IP address changes.
# TYPE probe_ip_addr_hash gauge
probe_ip_addr_hash 1.893922486e+09
# HELP probe_ip_protocol Specifies whether probe ip protocol is IP4 or IP6
# TYPE probe_ip_protocol gauge
probe_ip_protocol 4
# HELP probe_success Displays whether or not the probe was a success
# TYPE probe_success gauge
probe_success 1
Module configuration:
prober: tcp
timeout: 5s
http:
ip_protocol_fallback: true
follow_redirects: true
enable_http2: true
tcp:
preferred_ip_protocol: ip4
query_response:
- expect: ^220 .*
- send: "QUIT\r"
icmp:
ip_protocol_fallback: true
ttl: 64
dns:
ip_protocol_fallback: true
recursion_desired: true
Please notice the following lines
ts=2023-01-05T18:31:21.489331262Z caller=main.go:144 module=smtp target=smtp.gmail.com:587 level=info msg="Regexp matched" regexp="^220 .*" line="220 smtp.gmail.com ESMTP l3-20020aa7c303000000b0048a31c1743asm8308139edq.25 - gsmtp"
ts=2023-01-05T18:31:21.489350178Z caller=main.go:144 module=smtp target=smtp.gmail.com:587 level=info msg="Processing query response entry" entry_number=1
ts=2023-01-05T18:31:21.489362095Z caller=main.go:144 module=smtp target=smtp.gmail.com:587 level=debug msg="Sending line" line="QUIT\r"
So the problem is that message read by blackbox-exporter is not matched by your regex.
@tr3mor Nope, it is NOT regex related, as blackbox exporter in my case doesn't even reach the point where any regex could be made. This can clearly be seen when comparing your debug output of googles smtp with mine:
Snippet of your output with google smtp:
ts=2023-01-05T18:31:21.42695497Z caller=tcp.go:121 module=smtp target=smtp.gmail.com:587 level=info msg="Dialing TCP without TLS"
ts=2023-01-05T18:31:21.473299887Z caller=main.go:144 module=smtp target=smtp.gmail.com:587 level=info msg="Successfully dialed"
ts=2023-01-05T18:31:21.473355053Z caller=main.go:144 module=smtp target=smtp.gmail.com:587 level=info msg="Processing query response entry" entry_number=0
ts=2023-01-05T18:31:21.489195012Z caller=main.go:144 module=smtp target=smtp.gmail.com:587 level=debug msg="Read line" line="220 smtp.gmail.com ESMTP l3-20020aa7c303000000b0048a31c1743asm8308139edq.25 - gsmtp"
ts=2023-01-05T18:31:21.489331262Z caller=main.go:144 module=smtp target=smtp.gmail.com:587 level=info msg="Regexp matched" regexp="^220 .*" line="220 smtp.gmail.com ESMTP
After the line:
ts=2023-01-05T18:31:21.473355053Z caller=main.go:144 module=smtp target=smtp.gmail.com:587 level=info msg="Processing query response entry" entry_number=0
this line immediatly follows:
ts=2023-01-05T18:31:21.489195012Z caller=main.go:144 module=smtp target=smtp.gmail.com:587 level=debug msg="Read line" line="220 smtp.gmail.com ESMTP l3-20020aa7c303000000b0048a31c1743asm8308139edq.25 - gsmtp"
This means: blackblox_exporter has got a string to apply the regex on, which is made in the next line:
ts=2023-01-05T18:31:21.489331262Z caller=main.go:144 module=smtp target=smtp.gmail.com:587 level=info msg="Regexp matched" regexp="^220 .*" line="220 smtp.gmail.com ESMTP
Compared to my output:
ts=2023-01-01T10:59:13.153096002Z caller=handler.go:117 module=smtp target=mailserver.dnsname.com:25 level=info msg="Processing query response entry" entry_number=0
ts=2023-01-01T10:59:18.150072034Z caller=handler.go:117 module=smtp target=mailserver.dnsname.com:25 level=error msg="Error reading from connection" err="read tcp4 172.40.11.4:56042-><IP>:25: i/o timeout"
ts=2023-01-01T10:59:18.150326858Z caller=main.go:181 module=smtp target=mailserver.dnsname.com:25 level=error msg="Probe failed" duration_seconds=5.000459584
This line is identical to yours:
ts=2023-01-01T10:59:13.153096002Z caller=handler.go:117 module=smtp target=mailserver.dnsname.com:25 level=info msg="Processing query response entry" entry_number=0
After that normally the line from your output should appear ("[...] msg="Read line" [..]") together with the string what was read. But this is NOT happening in my case, but this:
After 5 seconds trying to get some string, blackbox gives up, showing this next line:
ts=2023-01-01T10:59:18.150072034Z caller=handler.go:117 module=smtp target=mailserver.dnsname.com:25 level=error msg="Error reading from connection" err="read tcp4 172.40.11.4:56042-><IP>:25: i/o timeout"
So it never reaches the point of applying a regex which leads to my assumption that this is not regex concerned.
And - as a plus - as I wrote, I changed the regex as you suggested but with exactly the same result. The output in my last post IS from your suggested regex.
Conclusion: The Step Number 2 ("get a message from it") of your explanation is problematic, so it never reaches step 3 ("match the message with expect regexp") and following.
@t0mcat1337 That would seem to suggest that your SMTP server is simply not responding with the SMTP banner upon connect - at least, not within the configured timeout. How long does the banner take to appear when you manually telnet to port 25?
@dswarbrick Doing a telnet from the same server where the blackbox_exporter container is running gives an immediate response from the SMTP Server, no delay:
# telnet mailserver.dnsname.com 25
Connected to mailserver.dnsname.com.
Escape character is '^]'.
220-mailserver.dnsname.com dnsname.com SMTP Server
QUIT
221 2.0.0 Bye
Connection closed by foreign host.
And again as described in my very first post: Even when using telnet inside the blackbox_exporter container gives an immediate response without any delay, too:
# docker exec -it monitoring-blackbox-exporter-1 sh
/ # telnet mailserver.dnsname.com 25
Connected to mailserver.dnsname.com
220 mailserver.dnsname.com dnsname.com SMTP Server
QUIT
221 2.0.0 Bye
Connection closed by foreign host
@t0mcat1337 Like @tr3mor, I'm not able to reproduce your issue. I can get blackbox_exporter to successfully probe a Postfix server (after tweaking the regex slightly).
Looking at the tcp prober code:
for i, qr := range module.TCP.QueryResponse {
level.Info(logger).Log("msg", "Processing query response entry", "entry_number", i)
send := qr.Send
if qr.Expect.Regexp != nil {
var match []int
// Read lines until one of them matches the configured regexp.
for scanner.Scan() {
level.Debug(logger).Log("msg", "Read line", "line", scanner.Text())
match = qr.Expect.Regexp.FindSubmatchIndex(scanner.Bytes())
if match != nil {
level.Info(logger).Log("msg", "Regexp matched", "regexp", qr.Expect.Regexp, "line", scanner.Text())
break
}
}
if scanner.Err() != nil {
level.Error(logger).Log("msg", "Error reading from connection", "err", scanner.Err().Error())
return false
}
if match == nil {
probeFailedDueToRegex.Set(1)
level.Error(logger).Log("msg", "Regexp did not match", "regexp", qr.Expect.Regexp, "line", scanner.Text())
return false
}
...
You're getting the "Read line" message logged, but not getting a "Regexp matched" message logged. Then, you're getting "Error reading from connection".
This tells me that the for scanner.Scan() { ... }
loop has run to the end without getting a regexp match, and scanner.Err()
is non-nil upon exiting the loop. Your SMTP server is sitting there waiting for a command which it will never receive, since your prober config is still waiting for the SMTP banner regex to match before it sends QUIT\r
. Since the SMTP server will not produce any more output until you send it a command, you get the TCP read timeout.
Try setting your regex to something very simple, like just ^220
.
@dswarbrick After a bunch of weeks I tested the whole stuff the first time since January today and now it seems to work... I have no idea why, the infrastructure has not change a bit since the last test, it is completely the same. This is now the result of the identical test in the identical infrastructure from January:
Config:
smtp:
prober: tcp
timeout: 5s
tcp:
preferred_ip_protocol: "ip4" # defaults to "ip6"
ip_protocol_fallback: false # no fallback to "ip6"
query_response:
- expect: "^220 ([^ ]+) ([^ ]+) SMTP (.+)$"
- send: "QUIT\r"
Curl:
curl http://localhost:9116/probe?target=mailserver.dnsname.com:25\&module=smtp\&debug=true
(Port is 9116, because in the meantime a productive blackbox_exporter is running at 9115, used a second blackbox_exporter container for this test)
Result:
Logs for the probe:
ts=2023-03-01T12:50:37.438269884Z caller=main.go:181 module=smtp target=mailserver.dnsname.com:25 level=info msg="Beginning probe" probe=tcp timeout_seconds=5
ts=2023-03-01T12:50:37.438458871Z caller=tcp.go:40 module=smtp target=mailserver.dnsname.com:25 level=info msg="Resolving target address" target=mailserver.dnsname.com ip_protocol=ip4
ts=2023-03-01T12:50:37.548735025Z caller=tcp.go:40 module=smtp target=mailserver.dnsname.com:25 level=info msg="Resolved target address" target=mailserver.dnsname.com ip=<IP>
ts=2023-03-01T12:50:37.548781366Z caller=tcp.go:112 module=smtp target=mailserver.dnsname.com:25 level=info msg="Dialing TCP without TLS"
ts=2023-03-01T12:50:37.550648939Z caller=handler.go:117 module=smtp target=mailserver.dnsname.com:25 level=info msg="Successfully dialed"
ts=2023-03-01T12:50:37.550725835Z caller=handler.go:117 module=smtp target=mailserver.dnsname.com:25 level=info msg="Processing query response entry" entry_number=0
ts=2023-03-01T12:50:37.622140992Z caller=handler.go:117 module=smtp target=mailserver.dnsname.com:25 level=debug msg="Read line" line="220 mailserver.dnsname.com dnsname.com SMTP Server"
ts=2023-03-01T12:50:37.622200632Z caller=handler.go:117 module=smtp target=mailserver.dnsname.com:25 level=info msg="Regexp matched" regexp="^220 ([^ ]+) ([^ ]+) SMTP (.+)$" line="220 mailserver.dnsname.com dnsname.com SMTP Server"
ts=2023-03-01T12:50:37.622220868Z caller=handler.go:117 module=smtp target=mailserver.dnsname.com:25 level=info msg="Processing query response entry" entry_number=1
ts=2023-03-01T12:50:37.622267542Z caller=handler.go:117 module=smtp target=mailserver.dnsname.com:25 level=debug msg="Sending line" line="QUIT\r"
ts=2023-03-01T12:50:37.622390547Z caller=main.go:181 module=smtp target=mailserver.dnsname.com:25 level=info msg="Probe succeeded" duration_seconds=0.184059938
Metrics that would have been returned:
# HELP probe_dns_lookup_time_seconds Returns the time taken for probe dns lookup in seconds
# TYPE probe_dns_lookup_time_seconds gauge
probe_dns_lookup_time_seconds 0.110304214
# HELP probe_duration_seconds Returns how long the probe took to complete in seconds
# TYPE probe_duration_seconds gauge
probe_duration_seconds 0.184059938
# HELP probe_failed_due_to_regex Indicates if probe failed due to regex
# TYPE probe_failed_due_to_regex gauge
probe_failed_due_to_regex 0
# HELP probe_ip_addr_hash Specifies the hash of IP address. It's useful to detect if the IP address changes.
# TYPE probe_ip_addr_hash gauge
probe_ip_addr_hash 2.175365531e+09
# HELP probe_ip_protocol Specifies whether probe ip protocol is IP4 or IP6
# TYPE probe_ip_protocol gauge
probe_ip_protocol 4
# HELP probe_success Displays whether or not the probe was a success
# TYPE probe_success gauge
probe_success 1
Module configuration:
prober: tcp
timeout: 5s
http:
ip_protocol_fallback: true
follow_redirects: true
enable_http2: true
tcp:
preferred_ip_protocol: ip4
query_response:
- expect: ^220 ([^ ]+) ([^ ]+) SMTP (.+)$
- send: "QUIT\r"
icmp:
ip_protocol_fallback: true
ttl: 64
dns:
ip_protocol_fallback: true
recursion_desired: true
Host operating system: output of
uname -a
Ubuntu 22.04.1 LTS
blackbox_exporter version: output of
blackbox_exporter --version
Docker Image v.0.23.0 blackbox_exporter, version 0.23.0 (branch: HEAD, revision: 26fc98b9c6db21457653ed752f34d1b7fb5bba43) build user: root@f360719453e3 build date: 20221202-12:26:32 go version: go1.19.3 platform: linux/amd64
What is the blackbox.yml module config.
What logging output did you get from adding
&debug=true
to the probe URL?What did you do that produced an error?
Run blackbox-exporter docker container with above config try with curl as above
What did you expect to see?
Successfull Probe not running in timeout and not producing the line: "Error reading from connection"
What did you see instead?
Probe runs into timeout Line "Error reading from connection" in Debug out
Additional Infos:
When entering the running blackbox container and try with a simple telnet session to the same postfix no connection problems appear, everything works as expected:
So any hint what is going wrong in the TCP probe here? I simply want to check if the Server is listening on Port 25 and answers correctly