Closed z1atk0 closed 2 years ago
Hi, sorry for the late reply, been busy changing $DAYJOB and that seems to have taken its toll on me.
Now, I don't know of any specifics re:RFC and unicode, but one could easily argue that this is at least a regression in behavior compared to v1.6 (you're right, it was a HUGE redesign!)
I must thank you for the extensive report, it was very thorough and I appreciate the time you've put into it a LOT! :heart:
I'll have a look to see if I can create a little test case for this to reproduce, and then try to isolate the culprit.
Well, that was quick!
https://github.com/troglobit/sysklogd/blob/e433051/src/syslogd.c#L817-L846
/*
* Removes characters from log messages that are unsafe to display.
* TODO: Permit UTF-8 strings that include a BOM per RFC 5424?
*/
static void
parsemsg_remove_unsafe_characters(const char *in, char *out, size_t outlen)
{
...
The culprit is a safety filter which was added when syncing with the FreeBSD sources to get RFC5424 support, https://cgit.freebsd.org/src/tree/usr.sbin/syslogd/syslogd.c#n967 so it appears FreeBSD has the same limitation.
Not really sure what to do about this. I'm not super keen on adding native support for parsing unicode to sysklogd, nor do I want to add an external dependency for this, and removing the check seems (?) to be unsafe ... I'm guessing they had a good reason for putting it in there :confused:
I'll continue poking around a bit, looking at FreeBSD and a few other sources.
RFC5424 seems to indicate that unicode support is optional;
"... a syslog application conforming to this specification may not be able to ascertain that the information given to it from an originator is encoded in UTF-8. If it cannot determine that with certainty, the syslog application may choose to not incorporate the BOM in the MSG."
From what I can tell though, the NetBSD syslogd does support unicode BOM deciphering (search for IS_BOM()
), so I'll go ahead and see if I can graft that onto sysklogd.
Ooh dear ... that was a hot mess! :scream: :fearful:
(I need to get over my fear of the NetBSD code base, it holds the key to future signing and TLS support, but for now I'm going back to the FreeBSD version ...)
Anyway, FreeBSD has the following option, which we can easily add. I've tested it and it fixes your reported issue.
-8 Tells syslogd not to interfere with 8-bit data. Normally syslogd replaces C1 control
characters (ISO 8859 and Unicode characters) with their “M-x” equivalent. Note, this
option does not change the way syslogd alters control characters (see iscntrl(3)). They
are always replaced with their “^x” equivalent.
What do you think, @z1atk0, would that be OK as a workaround for now?
Wow, seems like I missed quite a flurry of activity over the weekend! :scream_cat: Sorry for not replying any earlier, $DAYJOB
distractions over here as well. :wink:
As for the solution/workaround you found: yes, the -8
workaround would be perfectly fine for my purposes, TYVM! :+1: :sunglasses:
Cool, no problem, I'll push it later tonight (CET) :sunglasses::v:
There, hope it works better for you now. Thanks again for taking the time to report this and writing such an awesome bug report! :heart:
Hello,
there is an option in version 2.3.0 in the file src/syslogd.c
:
static int mask_C1 = 1; /* mask characters from 0x80 - 0x9F */
If to toggle it to 0, some problems with non-ascii chars will vanish. However,
i'm not sure if it's safe and if new hidden issues are not to come.
@Exchie this is a closed issue with a fix in 6022d3c, which also was released as v2.4.0? The fix implements a command line option -8
that does exactly what you say, toggle mask_c1
.
I'm not exactly sure what you want to say with your comment? So I'm going to assume you are new to GitHub and leave it at that.
Sorry, I haven't looked at the new modified code for "-8" option @2.4.0 version. I had 2.3.0 version installed then, have found out that this flag mask_C1 works for me, but to be sure this doesn't violate security, asked the question.
@Exchie OK, the lack of a question mark in your comment made it really difficult to tell if you meant it as a statement or a question. It looked like a statement, hence my confusing since the issue was already fixed.
Hi,
I'm not sure if this is actually a
sysklogd
issue, but I have to start somewhere, I guess. :man_shrugging: Short context: the recently released Slackware 15.0 comes with yoursysklogd-2.3.0
, while previous versions of Slackware (14.2 and lower) included the old infodrom.orgsysklogd-1.5.1
that you forked off a while ago.Now it looks like the logging of (the greater portion of) non-ASCII characters appears to be broken. I found a line in my logs which (when viewed with
less
, and$LESSCHARSET="utf-8"
) looks like this (please scroll right to the end of the line) ...... while it is actually supposed to look like this (again, please scroll right to the end of the line):
After quite a bit of head scratching, some debugging, and building of a couple of
sysklogd
's released versions on Slackware 15.0, I came to the conclusion that 1.6 is the last version that logs non-ASCII characters properly. As of 2.0, which I understand was more or less almost a rewrite from scratch, non-ASCII characters turn up garbled in the on-disk logs.This is with
sysklogd-2.0
and later ... on one terminal:And meanwhile, on a terminal not far away:
Note the
Message from UNIX socket
lines in the debug output where the characters are still intact! This seems to suggest thatsyslogd
is receiving all characters just fine, and they only get munched later on, somewhere on their way to disk.And here's the same exercise with
sysklogd-1.6
, where everything works as expected. On one terminal:And from the second terminal:
All characters end up in the logs ungarbled. In case you're wondering,
logger.sysklogd
is thelogger
binary fromsysklogd
itself, whilelogger.util-linux
is the one from - yes, you guessed it! :wink: -util-linux
, becausesysklogd-1.6
does not ship alogger
binary itself.Soooooooo ... TBH, I really have no idea if this is actually a bug, or if it's actually known, to be expected, wanted, needed, desired, or even required or mandated behaviour, either by some RFC, or some other standards I don't know about. I'll leave that up to you to decide. Thanks for putting up with my lengthy bug report! :+1: