sipcapture / heplify-server

HEP Capture Server for HOMER
https://sipcapture.org
GNU Affero General Public License v3.0
184 stars 85 forks source link

Logs flooded with: WARN overflowing loki channel #442

Closed systemcrash closed 3 years ago

systemcrash commented 3 years ago

Since routing in syslog messages to Promtail -> Loki, heplify-server floods logs with e.g.:

heplify-server    | 2020/11/03 18:04:40.287179 server.go:279: WARN overflowing loki channel
heplify-server    | 2020/11/03 18:04:40.311898 server.go:279: WARN overflowing loki channel
heplify-server    | 2020/11/03 18:04:49.448642 server.go:279: WARN overflowing loki channel
heplify-server    | 2020/11/03 18:04:49.489866 server.go:279: WARN overflowing loki channel
heplify-server    | 2020/11/03 18:04:51.162722 server.go:279: WARN overflowing loki channel
heplify-server    | 2020/11/03 18:04:51.162727 server.go:279: WARN overflowing loki channel
heplify-server    | 2020/11/03 18:04:51.172257 server.go:279: WARN overflowing loki channel
heplify-server    | 2020/11/03 18:04:51.172963 server.go:279: WARN overflowing loki channel
heplify-server    | 2020/11/03 18:04:56.721830 server.go:279: WARN overflowing loki channel
heplify-server    | 2020/11/03 18:04:56.723026 server.go:279: WARN overflowing loki channel
heplify-server    | 2020/11/03 18:04:57.149871 server.go:279: WARN overflowing loki channel
heplify-server    | 2020/11/03 18:04:57.452294 server.go:279: WARN overflowing loki channel
heplify-server    | 2020/11/03 18:04:59.368799 server.go:279: WARN overflowing loki channel
heplify-server    | 2020/11/03 18:04:59.369976 server.go:279: WARN overflowing loki channel
heplify-server    | 2020/11/03 18:04:59.528971 server.go:279: WARN overflowing loki channel
heplify-server    | 2020/11/03 18:04:59.540997 server.go:279: WARN overflowing loki channel
heplify-server    | 2020/11/03 18:05:01.914727 server.go:279: WARN overflowing loki channel
heplify-server    | 2020/11/03 18:05:01.915127 server.go:279: WARN overflowing loki channel
heplify-server    | 2020/11/03 18:05:01.920560 server.go:279: WARN overflowing loki channel
heplify-server    | 2020/11/03 18:05:01.921008 server.go:279: WARN overflowing loki channel

Changing promtail job name here, from heplify-server to syslog seems to fix it - but means I have to edit every Loki query in Homer... :

- job_name: syslog
  syslog:
      listen_address: 0.0.0.0:1514
      idle_timeout: 60s
      label_structured_data: yes
      labels:
        job: "heplify-server"

But this depends on a fix here, I think.

Not ideal when you're watching for other messages. What defines overflow? What is the loki channel? Okay, forget it, how do I make this go away?

What should HEPLIFYSERVER_LOKIURL be? ( What path component should be in there? some docker ML files say only port, some say path also ) What is HEPLIFYSERVER_LOKITIMER ? Code comments...... non-existent.

negbie commented 3 years ago

Okay, forget it, how do I make this go away?

It can't handle the amout of messages you're sending in. More IO or general config tweaks on Loki Server. Loki has a lot of config options and if you tweak it right you can get 100-300% better performance than with default config.

On heplify-server side you can try to increase LokiBulk and LokiBuffer but it won't help you if the Loki Server can't keep up. LOKIURL is http url to Loki ;) Not hard to guess I think. LOKITIMER is a flush timer.

systemcrash commented 3 years ago

Any tips on how to boost performance, with the current default YAML CFG of loki? Did you switch to gPRC?

lmangani commented 3 years ago

The loki performance should be tweaked. Heplify-server flooding it means the issue is on the receiver as Negbie was stating too. I would suggest to open a discussion topic on the homer repo for this since its not an heplify-server issue:

https://github.com/sipcapture/homer/discussions

systemcrash commented 3 years ago

An additional data-point.

I think this is rooted in heplify-server. Recently (okay, the last few months that heplify-server has been up), my logs have been flooded with nothing but:

heplify-server    | 2021/03/23 22:43:25.426045 server.go:279: WARN overflowing loki channel
heplify-server    | 2021/03/23 22:43:25.605528 server.go:279: WARN overflowing loki channel
heplify-server    | 2021/03/23 22:43:28.487986 server.go:279: WARN overflowing loki channel
heplify-server    | 2021/03/23 22:43:28.487982 server.go:279: WARN overflowing loki channel
heplify-server    | 2021/03/23 22:43:35.732958 server.go:279: WARN overflowing loki channel
heplify-server    | 2021/03/23 22:43:35.733529 server.go:279: WARN overflowing loki channel
heplify-server    | 2021/03/23 22:44:05.845331 server.go:279: WARN overflowing loki channel
heplify-server    | 2021/03/23 22:44:05.879459 server.go:279: WARN overflowing loki channel
heplify-server    | 2021/03/23 22:44:11.920953 server.go:279: WARN overflowing loki channel
heplify-server    | 2021/03/23 22:44:12.085123 server.go:279: WARN overflowing loki channel
heplify-server    | 2021/03/23 22:44:16.734932 server.go:279: WARN overflowing loki channel
heplify-server    | 2021/03/23 22:44:16.734982 server.go:279: WARN overflowing loki channel
heplify-server    | 2021/03/23 22:44:30.007663 server.go:279: WARN overflowing loki channel
heplify-server    | 2021/03/23 22:44:30.008251 server.go:279: WARN overflowing loki channel
heplify-server    | 2021/03/23 22:44:33.611904 server.go:279: WARN overflowing loki channel
heplify-server    | 2021/03/23 22:44:33.789584 server.go:279: WARN overflowing loki channel
heplify-server    | 2021/03/23 22:44:57.734713 server.go:279: WARN overflowing loki channel
heplify-server    | 2021/03/23 22:44:57.734835 server.go:279: WARN overflowing loki channel

And I had no labels in Loki.

Just restarted heplify-server and suddenly I have labels in Loki (first time in the months hep has been up), and finally multi-line syslog entries corresponding to each SIP/HEP packet hep-serv ingests.

Still not sure where to start digging for this. It's like a connection is dropped, or a TCP connection closed that heplify-server forgets to check. Does heplify-server check this after e.g. remote end terminates a connection (caused by a docker restart?).

lmangani commented 3 years ago

@systemcrash thanks for the insight - did the loki ip change as this happened?

systemcrash commented 3 years ago

Nope - all set statically.