troglobit / sysklogd

BSD syslog daemon with syslog()/syslogp() API replacement for Linux, RFC3164 + RFC5424
https://troglobit.com/sysklogd.html
Other
89 stars 20 forks source link

Logging of non-ASCII characters #49

Closed z1atk0 closed 2 years ago

z1atk0 commented 2 years ago

Hi,

I'm not sure if this is actually a sysklogd issue, but I have to start somewhere, I guess. :man_shrugging: Short context: the recently released Slackware 15.0 comes with your sysklogd-2.3.0, while previous versions of Slackware (14.2 and lower) included the old infodrom.org sysklogd-1.5.1 that you forked off a while ago.

Now it looks like the logging of (the greater portion of) non-ASCII characters appears to be broken. I found a line in my logs which (when viewed with less, and $LESSCHARSET="utf-8") looks like this (please scroll right to the end of the line) ...

Feb 21 21:54:15 next gcr-prompter[2368]: GLib-GIO: _g_io_module_get_default: Found default implementation gvfs (GDaemonVfs) for <E2>M-^@M-^Xgio-vfs<E2>M-^@M-^Y

... while it is actually supposed to look like this (again, please scroll right to the end of the line):

Feb 21 21:54:15 next gcr-prompter[2368]: GLib-GIO: _g_io_module_get_default: Found default implementation gvfs (GDaemonVfs) for ‘gio-vfs’

After quite a bit of head scratching, some debugging, and building of a couple of sysklogd's released versions on Slackware 15.0, I came to the conclusion that 1.6 is the last version that logs non-ASCII characters properly. As of 2.0, which I understand was more or less almost a rewrite from scratch, non-ASCII characters turn up garbled in the on-disk logs.

This is with sysklogd-2.0 and later ... on one terminal:

[root@next:/usr/src/slackware/a]# /usr/sbin/syslogd -s -d
Starting.
nslookup 'none:syslog'
Created inet socket 4 for (null):syslog ...
Created inet socket 5 for (null):syslog ...
Created UNIX socket 6 ...
cfline(*.info;*.!warn;authpriv.none;cron.none;mail.none;news.none   -/var/log/messages)
symbolic name: info ==> 6
symbolic name: warn ==> 4
symbolic name: none ==> 16
symbolic name: authpriv ==> 80
symbolic name: none ==> 16
symbolic name: cron ==> 72
symbolic name: none ==> 16
symbolic name: mail ==> 16
symbolic name: none ==> 16
symbolic name: news ==> 56
leading char in action: /
filename: '/var/log/messages'
RFC3164 format enabled
cfline(*.warn;authpriv.none;cron.none;mail.none;news.none   -/var/log/syslog)
symbolic name: warn ==> 4
symbolic name: none ==> 16
symbolic name: authpriv ==> 80
symbolic name: none ==> 16
symbolic name: cron ==> 72
symbolic name: none ==> 16
symbolic name: mail ==> 16
symbolic name: none ==> 16
symbolic name: news ==> 56
leading char in action: /
filename: '/var/log/syslog'
RFC3164 format enabled
cfline(*.=debug                     -/var/log/debug)
symbolic name: debug ==> 7
leading char in action: /
filename: '/var/log/debug'
RFC3164 format enabled
cfline(authpriv.*                       -/var/log/secure)
symbolic name: * ==> 255
symbolic name: authpriv ==> 80
leading char in action: /
filename: '/var/log/secure'
RFC3164 format enabled
cfline(cron.*                           -/var/log/cron)
symbolic name: * ==> 255
symbolic name: cron ==> 72
leading char in action: /
filename: '/var/log/cron'
RFC3164 format enabled
cfline(mail.*                           -/var/log/maillog)
symbolic name: * ==> 255
symbolic name: mail ==> 16
leading char in action: /
filename: '/var/log/maillog'
RFC3164 format enabled
cfline(*.emerg                          *)
symbolic name: emerg ==> 0
leading char in action: *
write-all
RFC3164 format enabled
Searching for /etc/syslog.d/*.conf ...
Opening /etc/syslog.d/dev-tty12.conf ...
Parsing /etc/syslog.d/dev-tty12.conf ...cfline(*.*  /dev/tty12)
symbolic name: * ==> 255
leading char in action: /
filename: '/dev/tty12'
RFC3164 format enabled
60 60  X 60 60 60 60  X 60  X  X 60 60 60 60 60 60 60 60 60 60 60 60 60 60 FILE: /var/log/messages
1F 1F  X 1F 1F 1F 1F  X 1F  X  X 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F FILE: /var/log/syslog
80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 FILE: /var/log/debug
 X  X  X  X  X  X  X  X  X  X FF  X  X  X  X  X  X  X  X  X  X  X  X  X  X FILE: /var/log/secure
 X  X  X  X  X  X  X  X  X FF  X  X  X  X  X  X  X  X  X  X  X  X  X  X  X FILE: /var/log/cron
 X  X FF  X  X  X  X  X  X  X  X  X  X  X  X  X  X  X  X  X  X  X  X  X  X FILE: /var/log/maillog
 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 WALL: 
FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF TTY: /dev/tty12
syslogd v2.0: restart.
logmsg: syslog.info<46>, flags 0, from next, app-name syslogd procid 23020 msgid (null) sd (null) msg syslogd v2.0: restart.
Called fprintlog_first(), fmt3164(46, 0x00, next, syslogd, 23020, (null), (null), syslogd v2.0: restart.)logging to FILE /var/log/messages
Called fprintlog_first(), fmt3164(46, 0x00, next, syslogd, 23020, (null), (null), syslogd v2.0: restart.)logging to TTY /dev/tty12
syslogd: restarted.
Debugging disabled, SIGUSR1 to turn on debugging.
Message from UNIX socket #6: <15>1 2022-02-22T12:40:33.145243+01:00 next test - - - iso/latin - öäüÖÄÜ߀¢§ 
logmsg: user.debug<15>, flags 0, from next, app-name test procid (null) msgid (null) sd (null) msg iso/latin - öäü�M-^V�M-^D�M-^\�M-^_�M-^B�¢§ 
Called fprintlog_first(), fmt3164(15, 0x00, next, test, (null), (null), (null), iso/latin - öäü�M-^V�M-^D�M-^\�M-^_�M-^B�¢§ )logging to FILE /var/log/debug
Called fprintlog_first(), fmt3164(15, 0x00, next, test, (null), (null), (null), iso/latin - öäü�M-^V�M-^D�M-^\�M-^_�M-^B�¢§ )logging to TTY /dev/tty12
Message from UNIX socket #6: <15>1 2022-02-22T12:40:33.149393+01:00 next test - - - unicode - –…‘’• 
logmsg: user.debug<15>, flags 0, from next, app-name test procid (null) msgid (null) sd (null) msg unicode - �M-^@M-^S�M-^@��M-^@M-^X�M-^@M-^Y�M-^@� 
Called fprintlog_first(), fmt3164(15, 0x00, next, test, (null), (null), (null), unicode - �M-^@M-^S�M-^@��M-^@M-^X�M-^@M-^Y�M-^@� )logging to FILE /var/log/debug
Called fprintlog_first(), fmt3164(15, 0x00, next, test, (null), (null), (null), unicode - �M-^@M-^S�M-^@��M-^@M-^X�M-^@M-^Y�M-^@� )logging to TTY /dev/tty12
[...]

And meanwhile, on a terminal not far away:

[root@next:~]# pkill -USR1 syslogd
[root@next:~]# date; logger.sysklogd -p debug -s -t test "iso/latin - öäüÖÄÜ߀¢§"; logger.sysklogd -p debug -s -t test "unicode - –…‘’•"; tail -2 /var/log/debug
Di 22 Feb 2022 12:40:33 CET
Feb 22 2022 12:40:33 test: iso/latin - öäüÖÄÜ߀¢§ 
Feb 22 2022 12:40:33 test: unicode - –…‘’• 
Feb 22 12:40:33 next test: iso/latin - öäü�M-^V�M-^D�M-^\�M-^_�M-^B�¢§ 
Feb 22 12:40:33 next test: unicode - �M-^@M-^S�M-^@��M-^@M-^X�M-^@M-^Y�M-^@� 

Note the Message from UNIX socket lines in the debug output where the characters are still intact! This seems to suggest that syslogd is receiving all characters just fine, and they only get munched later on, somewhere on their way to disk.

And here's the same exercise with sysklogd-1.6, where everything works as expected. On one terminal:

[root@next:/usr/src/slackware/a]# /usr/sbin/syslogd -d
Allocated parts table for 1024 file descriptors.
Starting.
Called init.
Called allocate_log, nlogs = -1.
cfline(*.info;*.!warn;authpriv.none;cron.none;mail.none;news.none   -/var/log/messages)
symbolic name: info ==> 6
symbolic name: warn ==> 4
symbolic name: none ==> 16
symbolic name: authpriv ==> 80
symbolic name: none ==> 16
symbolic name: cron ==> 72
symbolic name: none ==> 16
symbolic name: mail ==> 16
symbolic name: none ==> 16
symbolic name: news ==> 56
leading char in action: /
filename: /var/log/messages
Called allocate_log, nlogs = 0.
cfline(*.warn;authpriv.none;cron.none;mail.none;news.none   -/var/log/syslog)
symbolic name: warn ==> 4
symbolic name: none ==> 16
symbolic name: authpriv ==> 80
symbolic name: none ==> 16
symbolic name: cron ==> 72
symbolic name: none ==> 16
symbolic name: mail ==> 16
symbolic name: none ==> 16
symbolic name: news ==> 56
leading char in action: /
filename: /var/log/syslog
Called allocate_log, nlogs = 1.
cfline(*.=debug                     -/var/log/debug)
symbolic name: debug ==> 7
leading char in action: /
filename: /var/log/debug
Called allocate_log, nlogs = 2.
cfline(authpriv.*                       -/var/log/secure)
symbolic name: * ==> 255
symbolic name: authpriv ==> 80
leading char in action: /
filename: /var/log/secure
Called allocate_log, nlogs = 3.
cfline(cron.*                           -/var/log/cron)
symbolic name: * ==> 255
symbolic name: cron ==> 72
leading char in action: /
filename: /var/log/cron
Called allocate_log, nlogs = 4.
cfline(mail.*                           -/var/log/maillog)
symbolic name: * ==> 255
symbolic name: mail ==> 16
leading char in action: /
filename: /var/log/maillog
Called allocate_log, nlogs = 5.
cfline(*.emerg                          *)
symbolic name: emerg ==> 0
leading char in action: *
write-all
Called allocate_log, nlogs = 6.
cfline(include /etc/syslog.d/*.conf)
symbolic name: d/*.confCalled logerr, msg: unknown priority name "d/*.conf"
logmsg: syslog.err<43>, flags 4, from next, msg syslogd: unknown priority name "d/*.conf"
Called fprintlog, logging to CONSOLE /dev/console
Opened UNIX socket `/dev/log'.
 0: 60 60  X 60 60 60 60  X 60  X  X 60 60 60 60 60 60 60 60 60 60 60 60 60 60 FILE: /var/log/messages
 1: 1F 1F  X 1F 1F 1F 1F  X 1F  X  X 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F FILE: /var/log/syslog
 2: 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 FILE: /var/log/debug
 3:  X  X  X  X  X  X  X  X  X  X FF  X  X  X  X  X  X  X  X  X  X  X  X  X  X FILE: /var/log/secure
 4:  X  X  X  X  X  X  X  X  X FF  X  X  X  X  X  X  X  X  X  X  X  X  X  X  X FILE: /var/log/cron
 5:  X  X FF  X  X  X  X  X  X  X  X  X  X  X  X  X  X  X  X  X  X  X  X  X  X FILE: /var/log/maillog
 6:  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 WALL: 
logmsg: syslog.info<46>, flags 4, from next, msg syslogd v1.6: restart.
Called fprintlog, logging to FILE /var/log/messages
syslogd: restarted.
Debugging disabled, SIGUSR1 to turn on debugging.
Select interrupted.
Calling select, active file descriptors (max 3): 3 

Successful select, descriptor count = 1, Activity on: 3 
Message from UNIX socket: #3
Message length: 61, File descriptor: 3.
logmsg: user.debug<15>, flags 2, from next, msg Feb 22 12:22:07 test: iso/latin - öäüÖÄÜ߀¢§
Called fprintlog, logging to FILE /var/log/debug
Calling select, active file descriptors (max 3): 3 

Successful select, descriptor count = 1, Activity on: 3 
Message from UNIX socket: #3
Message length: 53, File descriptor: 3.
logmsg: user.debug<15>, flags 2, from next, msg Feb 22 12:22:07 test: unicode - –…‘’•
Called fprintlog, logging to FILE /var/log/debug
Calling select, active file descriptors (max 3): 3 
^Csyslogd: exiting on signal 2
logmsg: syslog.info<46>, flags 4, from next, msg exiting on signal 2
Called fprintlog, logging to FILE /var/log/messages

And from the second terminal:

[root@next:~]# pkill -USR1 syslogd
[root@next:~]# date; logger.util-linux -p debug -s -t test "iso/latin - öäüÖÄÜ߀¢§"; logger.util-linux -p debug -s -t test "unicode - –…‘’•"; tail -2 /var/log/debug
Di 22 Feb 2022 12:22:07 CET
<15>Feb 22 12:22:07 test: iso/latin - öäüÖÄÜ߀¢§
<15>Feb 22 12:22:07 test: unicode - –…‘’•
Feb 22 12:22:07 next test: iso/latin - öäüÖÄÜ߀¢§
Feb 22 12:22:07 next test: unicode - –…‘’•

All characters end up in the logs ungarbled. In case you're wondering, logger.sysklogd is the logger binary from sysklogd itself, while logger.util-linux is the one from - yes, you guessed it! :wink: - util-linux, because sysklogd-1.6 does not ship a logger binary itself.

Soooooooo ... TBH, I really have no idea if this is actually a bug, or if it's actually known, to be expected, wanted, needed, desired, or even required or mandated behaviour, either by some RFC, or some other standards I don't know about. I'll leave that up to you to decide. Thanks for putting up with my lengthy bug report! :+1:

troglobit commented 2 years ago

Hi, sorry for the late reply, been busy changing $DAYJOB and that seems to have taken its toll on me.

Now, I don't know of any specifics re:RFC and unicode, but one could easily argue that this is at least a regression in behavior compared to v1.6 (you're right, it was a HUGE redesign!)

I must thank you for the extensive report, it was very thorough and I appreciate the time you've put into it a LOT! :heart:

I'll have a look to see if I can create a little test case for this to reproduce, and then try to isolate the culprit.

troglobit commented 2 years ago

Well, that was quick!

https://github.com/troglobit/sysklogd/blob/e433051/src/syslogd.c#L817-L846

/*
 * Removes characters from log messages that are unsafe to display.
 * TODO: Permit UTF-8 strings that include a BOM per RFC 5424?
 */
static void
parsemsg_remove_unsafe_characters(const char *in, char *out, size_t outlen)
{
...

The culprit is a safety filter which was added when syncing with the FreeBSD sources to get RFC5424 support, https://cgit.freebsd.org/src/tree/usr.sbin/syslogd/syslogd.c#n967 so it appears FreeBSD has the same limitation.

Not really sure what to do about this. I'm not super keen on adding native support for parsing unicode to sysklogd, nor do I want to add an external dependency for this, and removing the check seems (?) to be unsafe ... I'm guessing they had a good reason for putting it in there :confused:

I'll continue poking around a bit, looking at FreeBSD and a few other sources.

troglobit commented 2 years ago

RFC5424 seems to indicate that unicode support is optional;

"... a syslog application conforming to this specification may not be able to ascertain that the information given to it from an originator is encoded in UTF-8. If it cannot determine that with certainty, the syslog application may choose to not incorporate the BOM in the MSG."

From what I can tell though, the NetBSD syslogd does support unicode BOM deciphering (search for IS_BOM()), so I'll go ahead and see if I can graft that onto sysklogd.

troglobit commented 2 years ago

Ooh dear ... that was a hot mess! :scream: :fearful:

(I need to get over my fear of the NetBSD code base, it holds the key to future signing and TLS support, but for now I'm going back to the FreeBSD version ...)


Anyway, FreeBSD has the following option, which we can easily add. I've tested it and it fixes your reported issue.

-8 Tells syslogd not to interfere with 8-bit data.  Normally syslogd replaces C1 control
   characters (ISO 8859 and Unicode characters) with their “M-x” equivalent.  Note, this
   option does not change the way syslogd alters control characters (see iscntrl(3)).  They
   are always replaced with their “^x” equivalent.

What do you think, @z1atk0, would that be OK as a workaround for now?

z1atk0 commented 2 years ago

Wow, seems like I missed quite a flurry of activity over the weekend! :scream_cat: Sorry for not replying any earlier, $DAYJOB distractions over here as well. :wink:

As for the solution/workaround you found: yes, the -8 workaround would be perfectly fine for my purposes, TYVM! :+1: :sunglasses:

troglobit commented 2 years ago

Cool, no problem, I'll push it later tonight (CET) :sunglasses::v:

troglobit commented 2 years ago

There, hope it works better for you now. Thanks again for taking the time to report this and writing such an awesome bug report! :heart:

Exchie commented 2 years ago

Hello, there is an option in version 2.3.0 in the file src/syslogd.c:    static int      mask_C1 = 1;          /* mask characters from 0x80 - 0x9F */ If to toggle it to 0, some problems with non-ascii chars will vanish. However, i'm not sure if it's safe and if new hidden issues are not to come.

troglobit commented 2 years ago

@Exchie this is a closed issue with a fix in 6022d3c, which also was released as v2.4.0? The fix implements a command line option -8 that does exactly what you say, toggle mask_c1.

I'm not exactly sure what you want to say with your comment? So I'm going to assume you are new to GitHub and leave it at that.

Exchie commented 2 years ago

Sorry, I haven't looked at the new modified code for "-8" option @2.4.0 version. I had 2.3.0 version installed then, have found out that this flag mask_C1 works for me, but to be sure this doesn't violate security, asked the question.

troglobit commented 2 years ago

@Exchie OK, the lack of a question mark in your comment made it really difficult to tell if you meant it as a statement or a question. It looked like a statement, hence my confusing since the issue was already fixed.