grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
23.4k stars 3.39k forks source link

Promtail: syslog job over UDP is not working #6772

Closed xtavras closed 5 months ago

xtavras commented 2 years ago

Describe the bug Promtail will not deliver "Syslog UDP" logs to Loki, but "Syslog TCP" logs works fine.

To Reproduce Steps to reproduce the behavior:

  1. Enable "udp" as "listen_protocol" option for syslog config, e.g.
    
    syslog:
      idle_timeout: 60s
      listen_protocol: udp
      label_structured_data: true
      labels:
        job: syslog
      listen_address: 0.0.0.0:514
Try to send some logs using UDP, I've used this example or real hardware switches which only support UDP syslog
https://gist.github.com/xtavras/4a01f7d1f94237a4abcdfb02074453c1

I can see in tcpdump that logs are coming to promtail machine, but nothing in Loki. I've tried to run promtail with debug option but don't see any additional errors.
2. Enable "**tcp**" as "**listen_protocol**" (it's default), e.g.
syslog:
  idle_timeout: 60s
  listen_protocol: tcp
  label_structured_data: true
  labels:
    job: syslog
  listen_address: 0.0.0.0:514
Send logs using tcp, I've used this example
https://gist.github.com/xtavras/be13760713e2a9ee1a8bdae2ed6d2565

Logs are visible in tcpdump and are coming and visible in Loki.

**Expected behavior**
Logs sent to UDP syslogs should be delivered to Loki same way as TCP syslogs

**Environment:**
 - Loki: 2.5.0
 - Promtail: 2.6.1

**Screenshots, Promtail config, or terminal output**
Promtail config

server: http_listen_port: 9080

positions: filename: /var/lib/promtail/positions.yml

clients:

scrape_configs:

jsalatiel commented 2 years ago

@xtavras How have you installed loki and promtail? One thing I noticed is that if I install using grafana/loki-stack chart the service will be created as tcp despite the listen_protocol: udp

xtavras commented 2 years ago

@jsalatiel I'm using this Helm chart for Loki https://github.com/grafana/helm-charts/tree/main/charts/loki-simple-scalable

Promtail is just official binary deployed with this ansible playbook https://github.com/patrickjahns/ansible-role-promtail which just use github releases, I've tried to build binary from "main" and have same problem.

I can see that Promtail is listeing on udp port using netstat

netmon:~# netstat -ulpen |grep 514
udp6       0      0 :::514                  :::*                                0          30732786   11061/promtail
jsalatiel commented 2 years ago

Can you check if the kubernetes service has been created with TCP or UDP?

xtavras commented 2 years ago

@jsalatiel what service do you mean? Isn't Promtail communicate with Loki just with HTTP calls, so they are always TCP?

jsalatiel commented 2 years ago

AFAIK when you enable syslog on promtail it creates a service so that external services can send syslog messages to promtail. Check kubectl get svc -o wide on your cluster in the namespace you have installed promtail.

xtavras commented 2 years ago

Sorry if I was not clear, promtail is running on VM just as binary, so no Docker or Kubernetes, only Loki itself is running on Kubernetes. And I see that port is udp locally on VM, if change it to tcp in config, it will be tcp in netstat output as well, so it looks good to me tbh.

jsalatiel commented 2 years ago

If there's nothing listening on UDP thats the reason you can not send udp messages to promtail =)

xtavras commented 2 years ago

But it is listening on UDP, I already posted my netstat ouput. Promtail send data to Loki always using HTTP/HTTPS so not sure why UDP is expected on Loki side, correct me if I'm wrong.

saibug commented 2 years ago

@xtavras did you check syslog config , if there is udp / or tcp destination ?

xtavras commented 2 years ago

@saibug yes, our switches (Huawei) only support UDP on port 514, like I've said I see it in tcpdump in cleartext on promtail machine.

xtavras commented 2 years ago

Here is tcpdump example, as you see protocol is UDP: first is test script, second is real log from switch

netmon:~# tcpdump -i any port 514 -vvv
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes

11:17:15.427290 IP (tos 0x0, ttl 63, id 22762, offset 0, flags [DF], proto UDP (17), length 112)
    10.1.176.55.37603 > 10.1.1.68.syslog: [udp sum ok] SYSLOG, length: 84
    Facility user (1), Severity warning (4)
    Msg: 1 2022-08-05T11:17:15.423205+02:00 t14 promtail 291547 1 - \0xef\0xbb\0xbfthis is a UDP test
    0x0000:  3c31 323e 3120 3230 3232 2d30 382d 3035
    0x0010:  5431 313a 3137 3a31 352e 3432 3332 3035
    0x0020:  2b30 323a 3030 2074 3134 2070 726f 6d74
    0x0030:  6169 6c20 3239 3135 3437 2031 202d 20ef
    0x0040:  bbbf 7468 6973 2069 7320 6120 5544 5020
    0x0050:  7465 7374
11:21:02.296315 IP (tos 0x0, ttl 254, id 51272, offset 0, flags [none], proto UDP (17), length 205)
    10.1.250.168.38514 > 10.1.1.68.syslog: [udp sum ok] SYSLOG, length: 177
    Facility local7 (23), Severity info (6)
    Msg: Aug  5 2022 09:27:44 ds28-293L.example.com %%01INFO/6/SUPPRESS_LOG(l)[43605]:Last message repeated 5 times.(InfoID=1092489232, ModuleName=MSTP, InfoAlias=RECEIVE_MSTITC)
    0x0000:  3c31 3930 3e41 7567 2020 3520 3230 3232
    0x0010:  2030 393a 3237 3a34 3420 6473 3238 2d32
    0x0020:  3933 4c2e 6265 726c 696e 2e6e 692e 7465
    0x0030:  616d 2025 2530 3149 4e46 4f2f 362f 5355
    0x0040:  5050 5245 5353 5f4c 4f47 286c 295b 3433
corpix commented 2 years ago

Same thing for me. I have Unifi EdgeSwitch which supports only UDP.

Promtail configuration:

scrape_configs:
  # ...
  - job_name: syslog
    relabel_configs:
      - source_labels:
          - __syslog_message_hostname
        target_label: host
    syslog:
      idle_timeout: 60s
      label_structured_data: true
      labels:
        cluster: lab
        job: syslog
      listen_address: 0.0.0.0:514
      listen_protocol: udp

Started via systemd (on NixOS) on 514/udp. Attaching traffic dump

514.cap.zip

corpix commented 2 years ago

In my case packets are handled by nontransparent parser. I've made a minimal server to debug this https://gist.github.com/corpix/d76a90678d24e412eb5f9347dc1b58c1

@xtavras example could be used to send UDP packets to this server https://gist.github.com/xtavras/4a01f7d1f94237a4abcdfb02074453c1

UDP is not connection oriented, so there is no EOF. Parser relies on EOF or delimiter (which is LF by default). In my case 1 packet contains exactly 1 syslog message, so there is no delimiter and ReadBytes never return.

Daedaluz commented 2 years ago

Hello! I just hit this issue with messages from my pfSense. Just would like to add that it appears it can send multiple messages in same udp packet, none with the delimiter. My dirty hack to add a \n in the end of every received packet failed.

I'll just put something in front for now.

Sorry. Something else was going on here, not sure what. Each message is one packet.

chaudum commented 2 years ago

I've been testing UDP using rsyslog and the following configuration:

module(load="omprog")
module(load="mmutf8fix")
action(type="mmutf8fix" replacementChar="?")
*.* action(type="omfwd" protocol="udp" target="<host>" port="514" Template="RSYSLOG_SyslogProtocol23Format")

With this configuration Promtail receives the logs and pushed them to Loki.

Unfortunately I don't have a hardware device where I could test it.

I've also tested your Python script, and by adding a newline character at the end of the log message, it worked:

#!/usr/bin/env python3

import socket
import logging
from random import random
from time import sleep
from rfc5424logging import Rfc5424SysLogHandler, transport

logger = logging.getLogger("promtail")
logger.setLevel(logging.DEBUG)

rfc5424Handler = Rfc5424SysLogHandler(address=("localhost", 514), hostname="localhost", appname="promtailudp.py")
rfc5424Handler.setLevel(logging.DEBUG)
logger.addHandler(rfc5424Handler)

for x in range(100):
    logger.warning("this is a UDP test\n", extra={"msgid":1})
    sleep(random())

Seems like this is required as delimiter.

xtavras commented 2 years ago

@chaudum thank you for looking into it, I can confirm that your script works indeed, removing newline breaks it. Unfortunately our switches do not add new line at the end (and looking at comments looks like some other vendors don't do it too). What do you think, would it be possible as workaround append newline to such log messages in promtail? I guess as another workaround we can use rsyslog as broker between promtail and switches, but I'm just wondering why TCP works fine even without newline and our past logging solutions like rsyslog or graylog worked with same hardware as well.

chaudum commented 2 years ago

@chaudum thank you for looking into it, I can confirm that your script works indeed, removing newline breaks it. Unfortunately our switches do not add new line at the end (and looking at comments looks like some other vendors don't do it too). What do you think, would it be possible as workaround append newline to such log messages in promtail?

@grafana/loki-team What do you think?

I guess as another workaround we can use rsyslog as broker between promtail and switches, but I'm just wondering why TCP works fine even without newline and our past logging solutions like rsyslog or graylog worked with same hardware as well.

Tbh, I have no idea why it works over TCP but not over UDP, since we use the same code path for parsing messages. I would need to investigate the syslog parser library we use. That would take a lot of time I guess.

jeschkies commented 1 year ago

@chaudum I took a quick look. Seems you've added UDP support in https://github.com/grafana/loki/pull/5790. I'm wondering. Are we selecting the correct parser or does the parse simply not support messages without newlines?

Pengozoid commented 1 year ago

I noted similar behaviour with promtail:2.8.2. UDP logs from Synology DS920+ NAS do not appear in queries to Loki after sending but flush to Loki on restart of Promtail container. TCP works as expected.

erikbaranowski commented 1 year ago

I dug a little deeper and found that the reader we send to the parser is coded to read until it sees the expected delimiter. github.com/leodido/ragel-machinery/parser. In promtail's case it is looking for a Line Feed so the reader will hang looking for it on any message not ending in a line feed.

Grandma-Betty commented 1 year ago

I noted similar behaviour with promtail:2.8.2. UDP logs from Synology DS920+ NAS do not appear in queries to Loki after sending but flush to Loki on restart of Promtail container. TCP works as expected.

Same here. On Synology NAS you can choose whether to send syslogs in TCP or UDP format. TCP works without any issues, UDP not. Funny thing is that syslog is implemented in a primitive way on most devices (e.g. switches), so it is definded as UDP by default and cannot be changed as on advanced devices (e.g. Synology NAS). That said, Promtail is mostly useless to me the way it is right now for syslog purposes, unfortunately. It would be a big step forward if UDP could be supported in the near future.

joshuapare commented 12 months ago

Put in a PR to fix this here and in grafana agent - very simple workaround of making a bytes reader from the datagram so the parser gets the EOF

lfmullen commented 11 months ago

Hi, (first interaction on Github so apologises for any missing information I may provide)

I think I may be experiencing this exact issue in our network. We have a few hundreds switches and routers which syslog back to two central archive logging servers using UDP 514, Debian bare-metal based running syslog-ng. After forwarding all syslog to Promtail via UDP, nothing is caught. Promtail does work via scraping the local logs sorted via syslog but this is not ideal for labelling.

We run a Grafana instance on one of these boxes and I would love to get Loki logs working with Promtail.

I will be keeping a close eye on this issue as I'm eager to find a solution, and reluctant to move our boxes to rsyslog.

cstyan commented 10 months ago

@joshuapare was your grafana agent PR merged? if so we can likely merge a similar fix here, we're just trying to avoid too much divergence since the agent team forked the promtail code and soon all logs collection work will be done within that repo.

joshuapare commented 10 months ago

@joshuapare was your grafana agent PR merged? if so we can likely merge a similar fix here, we're just trying to avoid too much divergence since the agent team forked the promtail code and soon all logs collection work will be done within that repo.

Yep it was, and the matching PR for this repo is here as well: https://github.com/grafana/loki/pull/10708

MarcBrendel commented 5 months ago

@cstyan : Referring to your comment of Promtail being in a "feature complete" status now (https://github.com/grafana/loki/pull/10256#issuecomment-2050758320), is there any chance for this be fixed as well? It seems that there have been multiple almost PRs for Promtail and for the Grafana Agent, but receiving syslog events via UDP is still not working. A lot of (hardware and software) appliances are still unable to use UDP as a transport protocol for syslog, fixing this would be greatly appreciated by a lot of people (including me and my colleagues)!