Open tristanmorgan opened 1 year ago
Hi @tristanmorgan Thanks for the report.
I think I already found the issue with the goroutine leak. The UDP transport of the syslog target creates a new goroutine for each remote address, see https://github.com/grafana/loki/blob/74a4dca54e4077b65a39c18b8390e76870b01526/clients/pkg/promtail/targets/syslog/transport.go#L366C9-L372
stream, ok := streams[addr.String()]
if !ok {
stream = NewConnPipe(addr)
streams[addr.String()] = stream
t.openConnections.Add(1)
go t.handleRcv(stream)
}
These are only cleaned up when Promtail shuts down.
The fix would be to clean up these streams regularly in case they had been idle for some time and haven't seen any activity.
Thanks for responding so quickly. Would a timeout be possible (probably an upstream change)? Given this is UDP transport there is no FIN type message to close the connection. I wish I was more capable in Go to be able to try implement it.
Thanks for responding so quickly. Would a timeout be possible (probably an upstream change)? Given this is UDP transport there is no FIN type message to close the connection.
Correct. Promtail would need to handle the idle timeout for each "connection" itself by tracking the timestamp of the last packet received.
I wish I was more capable in Go to be able to try implement it.
No worries. I will take a look now.
Describe the bug There seems to be a go_routine leak dealing with a SYSLOG collector. possibly related to #8054.
To Reproduce Steps to reproduce the behavior:
Expected behavior Expected behaviour is for go_routines to be created and removed over time but only a few to be long lived.
Environment:
Screenshots, Promtail config, or terminal output below is a partial stack trace showing one of the many go_routines sitting idle.