Open JWThorne opened 4 months ago
Which version please?
https://github.com/google/mtail/blob/main/docs/Troubleshooting.md#reporting-a-problem
Does it look like mtail
has also stopped processing lines when a GET is being processed?
This may indeed be related to the issue I was seeing and the change https://github.com/google/mtail/pull/908. I was testing with the /json
handler and it does emit the headers and then stream the response. I had noticed that testing /metrics
was returning an empty response (when I was in the browser).
For /json
I was seeing a E0805 16:26:05.490112 435505 json.go:27] write tcp [::1]:3903->[::1]:55250: i/o timeout
message.
From curl
with verbose logging I was seeing:
* transfer closed with outstanding read data remaining
* Closing connection
curl: (18) transfer closed with outstanding read data remaining
When rebuilding mtail without #908 (I need #906 for my mtail program) and testing /metrics
again, I do see that there's nothing written to the logs, and curl looks like:
* Empty reply from server
* Closing connection
curl: (52) Empty reply from server
@JWThorne you can probably look at one of the other exporters to confirm you're seeing partial output from them, and you can either build from the source or wait till #908 is released
the /metrics endpoint will just fail if mtail has more than 70k metrics
This is the number of outputted metrics or the number of log lines you're processing?
For my case we had a number of counters with a large number of labels, so it was generating a large JSON payload and hitting the timeout.
We find that with our current deployment, even though the scrape time is under 2.5 seconds, HTTP GET on the /metrics endpoint will just fail if mtail has more than 70k metrics. There are no errors in the logs, no issues, just a failed response and a connection close after 2 seconds. Reducing the metric count appears to restore operation
However, we need more metrics.