Filter mechanism for logs in journald

PhilippWendler commented 7 years ago

Submission type

Request for enhancement (RFE)

systemd version the issue has been seen with

234

Used distribution

irrelevant

Description

A possibility to filter logs in journald, i.e., before they are written to the journal, would be helpful in cases where some daemon spams the log with countless messages.

Of course, it is better to fix the source instead of filtering away log messages, but for many users the former is not possible and the latter is a helpful workaround until the real problem is fixed. As an example, my concrete current problem is that I need to run snmpd for lack of alternatives and it spams the log with error messages (so just reducing the log level would mean I also do not see other errors), and this behavior exists for years but is not getting fixed (Debian bug from 2015, Red Hat bug from 2016). This shows that in practice, a log-filter possibility would indeed be helpful.

I know that journalctl can filter log messages, but for cases like this I do not want these messages to be written to the journal, because for example many such messages can lead to earlier-than-desired log truncation due to the SystemMaxUse/SystemKeepFree options.

rsyslog has an equivalent feature (which I have been using until now), so I could solve this by ignoring the journal and using traditional syslog, but I really like the additional features of the journal and do not want to fall back to syslog just because of this.

I do not have a full proposal how the syntax for log filters could look like. I suggest to start with an option in journald.conf that can take an expression similar to that of journalctl, but I assume that for such a feature to be most useful there would need to be the possibility to due regexp or at least substring matches on the message field, like rsyslog has. Maybe a syntax inspired by CSS attribute selectors could be used or something like Perl-style =~ for regexp matches?

liger1978 commented 3 years ago

@poettering

...fixing your software...

This is sadly often out our of hands as sysadmins, as it is not our software.

you only want to see some stuff but not other stuff? then filtering when looking at it makes more sense, i.e. what journalctl already implements.

In some cases, if it has gone to disk, it is too late.

Security and lockdown are a big one. massaging broken program output to make it less broken, or hiding bugs is not a priority. sorry.

This is fair. Nevertheless, sysadmins would be grateful for systemd's benevolence on this issue. Would you accept a PR, if it is about priorities (not from me...I mean in principle)?

gleventhal commented 3 years ago

Apologies, I should have said that there is optional qr-code support in journalctl (I'd misremembered, conflating systemd with journald). I just don't understand the priority of features of the project, and was asking in earnest the reason for the pushback on filtering.

I presume when you talk about the "QR reader" you mean the forward secure sealing support in journald, where optionally a QR code can be generated to extract the sealing key, and the dep is build time and runtime optional? I mean, you are confused about reading and generating, and you make it sound as if this was some generic QR code support, while it very specific to FSS.

One major priority for systemd is system security, i.e. locking down logs so that they cannot be modified without this being noticed on intrusion. doing messy pre-collection regex mangling/filtering/processing OTOH is a hack around broken software. Now, we should always have precaution against broken software, and we do (as we enforce log ratelimiting to misbehaving clients), but manually patching around in the log stream is just doctoring around bugs at the wrong place and makes this very unsympathetic to me. systemd is not the place for working around broken software. it's the place to monitor, to lockdown, to resource manage software, but not massage badly written software into working slightly less badly.

That said, I actually think filtering is necessary and important – but at the moment you look at the logs, so that you can apply filtering and see immediate effects. i.e. have a robust, static database of log entries and filter when you look at it. This is what we implement. journalctl has a multitude of filtering options built in, with regexes and whatnot — all when looking at things.

I mean, to say this differently: why do you want to filter at collection time? because one client generates to much rubbish? if so, ratelimiting is the answer or fixing your software or your configuration. or is it because you only want to see some stuff but not other stuff? then filtering when looking at it makes more sense, i.e. what journalctl already implements.

So yes, priority for systemd are many things. Security and lockdown are a big one. massaging broken program output to make it less broken, or hiding bugs is not a priority. sorry.

@poettering I like the rate limiting feature of systemd-journald, it works well and I use it in production. It doesn't solve the problem I am concerned with however, which is precisely a security problem:

For retention/compliance reasons, I need to log shell history to a centralized source, and have enabled shell history to syslog in our Bash version. We use persistent storage in journald to have a decent retention story while limiting memory usage.

We do not want to log the shell history to the journald database persistently however, because it would keep some sensitive information on-disk that we do not wish to keep (we have our syslog client forward the shell history to a central server over TLS).

By not being able to filter in journald, we are forced to either abandon persistent storage for journald, abandon journald, or tolerate writing sensitive data to local disk when we don't want to do that. Because journald consumes /dev/log, it is the first stop for syslog traffic, but for shell history (which is captured on a specific syslog facility and therefore easily filterable without regex), we would like it to pass-through to the syslog daemon without being persisted by journald.

aldem commented 3 years ago

@poettering

One major priority for systemd is system security, i.e. locking down logs so that they cannot be modified without this being noticed on intrusion. doing messy pre-collection regex mangling/filtering/processing OTOH is a hack around broken software.

You do realize that anyone who has root access to the system (most "standard" systems, at least) could mange anything and everything without anybody noticing anything, unless the system is properly configured (which is never the default) and locked down? The mere presence of systemd does not make system secure or locked down by default - thus this is a very weak argument, to be honest, and systemd alone is not able to provide this anyway.

Now, we should always have precaution against broken software, and we do (as we enforce log ratelimiting to misbehaving clients), but manually patching around in the log stream is just doctoring around bugs at the wrong place and makes this very unsympathetic to me.

No, it is not "doctoring around bugs", it is an instrument for sysadmins/devops/system owners to get in control of their own servers and usage of storage (at the very least), and systemd must serve, not dictate.

There are many, way too many "broken" (or not) apps which are either logging what is not supposed to be logged or at least not supposed to be logged always, and there is no easy way to fix this - as often their maintainers, exactly like you in case of systemd, are pointing to log collectors - "you could easily fix this during collection time", and some even say "systemd is broken if it does not allow collection time filtering".

Fine, so what now? You may continue to point finger to somebody else, or you may (at least principally) agree to allow filtering and let someone implement it - thus solving most logging issues (related to collection time filtering) once and forever.

That said, I actually think filtering is necessary and important – but at the moment you look at the logs, so that you can apply filtering and see immediate effects. i.e. have a robust, static database of log entries and filter when you look at it.

This is only partially true. If some app got issues and suddenly started to log millions of messages/second (due to external events or whatever is the reason, not necessary a bug) - you are doomed, as your log space, be it either memory or disk may be not sufficient to store everything, thus you lose important messages (due to rotation or other constraints - and no, reate limiting not necessarily helps).

Additional issue (this I mentioned before long time ago) is noise logging which is regular activity - exactly the case when it is better to log only failures (time/service start/stop) - but at least this is addressed somehow in other tickets (though still crippled).

I mean, to say this differently: why do you want to filter at collection time? because one client generates to much rubbish? if so, ratelimiting is the answer or fixing your software or your configuration.

Because it could be embedded (or otherwise limited) system which simply has not enough RAM/disk space/endurance to store all this stuff, and as you probably know it is not always possible to fix software (especially if you are not developer), and especially if developer/maintainer is insisting that "filtering must be done while logging".

Rate limiting is not a silver bullet and it does not work always like you expect it to, and it does not work at all if blocks of lines are repeated (not just single lines).

So yes, priority for systemd are many things. Security and lockdown are a big one. massaging broken program output to make it less broken, or hiding bugs is not a priority. sorry.

Filtering output is not "massaging broken program output" - this is just a tool. There are many similar tools exists and there is a reason for, and many logging tools (rsyslogd one of them) are providing filtering - for a reason, and only you are insisting that this is "not important" or even "irrelevant" - despite that many people (and not only here) expressed their views on this.

By the way - the fact that it is currently impossible to precisely configure what is logged or not in systemd (there are lots of noisy, absolutely irrelevant messages logged) - doesn't it make systemd "buggy" and "broken" according to your logic?

After all, are you doing this for us - or against us? Wasn't systemd supposed to improve things and make life easier instead of making it harder?

Security and lockdown are good priorities, but please give us a choice so we could make our own decisions how to handle our systems, their resources and storage - as I am pretty sure that no one wants those policies enforced.

poettering commented 3 years ago

By not being able to filter in journald, we are forced to either abandon persistent storage for journald, abandon journald, or tolerate writing sensitive data to local disk when we don't want to do that. Because journald consumes /dev/log, it is the first stop for syslog traffic, but for shell history (which is captured on a specific syslog facility and therefore easily filterable without regex), we would like it to pass-through to the syslog daemon without being persisted by journald.

So that makes it sound like having different maximum log levels and/or different ratelimiting and/or different log stores per service is what you are looking for? doing log flow control per unit is something i am a lot more sympathetic to. It's the regex mess I am not interested in. i.e. we already have:

LogLevelMax= as a per-service setting. You can turn off logging from a chatty service by setting that to emerg, and anything below emergency log level will be suppressed.
LogRateLimitIntervalSec= + LogRateLimitBurst= also is a per-service setting: you could lower the allowed rate so much that basically nothing goes through anymore.
LogNamespace= allows you to open independent logging namespaces which you can group services into. You could have one namespace with long retention, and another with short retention, and then assign your services to each. (each logging namespace gets its own journald instance with its own config file).

Neither of these approaches needs filtering by regex or something like that. Instead it provides log controls keyed by "trusted" properties of the log stream, i.e. unit membership of a process.

I mean the problem with mangling the log stream by regex is that it tries to reconstruct context state where it was dropped before. Our approach is generally that context should not be reconstructed, but passed along in a safe, secure way, and that's how the per-unit knobs work. to give a more specific example: if you say "filter out every log line starting with the word bash", then that's a terrible rule, since any program can do strcpy(argv[0], "bash") and then evade logging.

bbetter173 commented 3 years ago

@poettering - I think this is a case of not letting perfection be the enemy of better. Yes, in an ideal world we would update applications to have different log levels, and use that to filter out the messages we're interested in - but this isn't an ideal world and we have to deal with all kinds of edge cases. Honestly, as a user with very little interest in debating this - I just want to get useful logging information out of journald - like I can with practically any other logging system. It's incredibly frustrating.

poettering commented 3 years ago

Fine, so what now? You may continue to point finger to somebody else, or you may (at least principally) agree to allow filtering and let someone implement it - thus solving most logging issues (related to collection time filtering) once and forever.

BTW; there's no shame in using rsyslog if that suits your needs. journald happily passes the log stream it collects on to it. There are plenty of features that rsyslog provides you with that journald does not. We do not intend to replace rsyslog, but just provide a minimal logging framework with features that make sense. i.e. collecting metadata per log message, doing basic indexing stuff, associating trust to log entries, locking down the log stream to make it temper proof, that makes sense to us. But if you want higher-level processing, then ultimately journald is not going to cover you that in full. heck, it doesn't even speak BSD syslog, which probably most centralized log servers want most likely.

So, let's not shove any concept blindly into journald, just because rsyslog has it. We can certainly add new features to journald, but regex filtering is not one i think makes much sense for us. it fucks up trust, it tries to reconstruct structure from something that was mostly converted to human language already and is just a generally a hackish mess.

I mean, something we could look into instead is maybe add a mechanism to have a "secrecy" level on log messages, that is orthogonal to the severity log levels we already have. We could define a journal record field for that, maybe SECRECY_LEVEL=. We could then make that configurable per service, but also allow apps to supply it (the per-service setting would bump it up though). We could also add some support that the syslog input synthesizes the secrecy level from the traditional AUTHPRIV facility: by default all messages have a low secrecy level, but if they are in the AUTHPRIV facility it would be bumped to high.

But this is where I think things must go: figure out what is actually the goal of the filtering here, and then come with a scheme that can solve this in a reasonable way that one can actually trust. From @gleventhal it appears to me an explicit per-service + authpriv-triggered secrecy level thing would fit the bill much better than any regex matching. I mean, @gleventhal apparently wants to do this for security/trust/retention/secrecy reasons: binding things to trivially manipulatable regex parsing is hence a really bad idea in this context: if you want security controls enforced on your data, you should not make them dependent on the data itself (and thus exploitable), but on trusted metadata.

hence, for a minute, try to figure what actually the usecase is you want to cover, and then for a minute think about whether there aren't better ways to reach the goal for them, that doesn't compromise so terribly.

poettering commented 3 years ago

By the way - the fact that it is currently impossible to precisely configure what is logged or not in systemd (there are lots of noisy, absolutely irrelevant messages logged) - doesn't it make systemd "buggy" and "broken" according to your logic?

Dunno, I am pretty sure we should log about every unit event that happens. I don't think that is broken, no.

That said, we discussed some stuff in this area that might ultimately give you what you want on this: we were thinking of beefing up our internal logging framework so that objects that we generate logs from not only get their metadata picked up and attached to the log message, but can also (optionally) define per-object log level maximums. This is already implemented for udev now (1a0bd01529d42bc4fed71d94c914b590b8c6ff7e): you can now change the per-device log level via an udev rule. The idea is to sooner or later have a similar config option for all other types of objects systemd manages. And that includes having a per-unit log level that also applies to the logs systemd generates about it. Hence if you want a "ghost" unit that is never ever logged about you could set its max log level to "emerg" and it will all be eaten up, if you are into that kind of undebuggable stuff.

After all, are you doing this for us - or against us? Wasn't systemd supposed to improve things and make life easier instead of making it harder?

Security and lockdown are good priorities, but please give us a choice so we could make our own decisions how to handle our systems, their resources and storage - as I am pretty sure that no one wants those policies enforced.

You know, you make it sound as if systemd takes your freedom, you liberty away, to configure things the way you want. It doesn't though: you can always use rsyslog on top of it, it does support regex filtering after all.

But in systemd we do not accept every feature in the world, just because people want it, we think about it, and try to figure out what might be a better approach to the problem. And yes, I genuinely believe that schemes like the three per-unit options, or the idea of adding a SECRECY_LEVEL logging field are much better solutions to the problems people report. or to say this differently: please don't tell us that some specific implementation is the way to go. There's a very good chance that a different approach has much nicer properties, and hence I am more interested in usecases than in "ready made" solutions such as regex filters.

aog2000a commented 3 years ago

I just realize (thanks @poettering for the hint) that a related ticket i referenced above long ago was implemented in the mean time: "Per-Unit Journal Limits" https://github.com/systemd/systemd/issues/3045 That is a huge step forward!! I was actually never interested in "regex filtering", just some finer-grained control of the logging per unit.

I wanted to mention it here, possibly many (like me) are/were not aware that that is now possible :-)

bbetter173 commented 3 years ago

This use-case has been described many times at this point in the ticket:

An application you need to manage is not under your control. Why it isn't under your control doesn't really matter - but for the sake of argument lets say the source code has been lost and all you're working with are the binaries.
That application logs a lot of "useless" data - maybe as a matter of course, maybe due to some unique conditions that you can't alter for whatever reason. This drowns out the useful data.

All the approaches you've suggested so far either require modifying the application (which is not possible in all cases) or using RateLimiting (which loses the useful data as well).

Given these constraints - @poettering, what would you suggest a sysadmin does to prevent the log from spamming messages to disk/RAM while retaining useful information that might be output without having the resources to modify the underlying application?

example: if you say "filter out every log line starting with the word bash", then that's a terrible rule, since any program can do strcpy(argv[0], "bash") and then evade logging.

Sure, a bash.* filter would be dumb, but other filters - for example ^postfix/qmgr[\d+]: \w+: removed$ might not be (though this example probably is, I just pulled a random line and wrote a regex for it). The point is to give us tools that we can use (or not) as we see fit.

aldem commented 3 years ago

LogLevelMax is not always an option - not all apps use structured logging, in fact many switched to stdout/stderr logging.

All rate limiting options, as I have mentioned before, are completely unusable for multiline unstructured output (also pretty common)

if you say "filter out every log line starting with the word bash", then that's a terrible rule, since any program can do strcpy(argv[0], "bash") and then evade logging

Well, there is no reason to evade when you could simply avoid writing anything anywhere - and surely if program could do strcpy(argv[0], "bash") it could close(1); close(2) and do lots of stuff to avoid logging without any broken rules.

The idea, actually, was to allow per service filtering, i.e. something like LogExclude=pattern - thus anything that would normally go to journald from this specific service/unit and matching pattern would be simply ignored. Global filtering is something else but in some cases could be useful too.

We could define a journal record field for that, maybe SECRECY_LEVEL=

Same problem as with LogLevelMax - it simply does not work for unstructured logging... You may call the software that is not doing structured logging "broken" but it would not change anything - most software would not do it anyway, as there is no universal way to do it on every platform/system/device and journald is not present on all platforms anyway.

Dunno, I am pretty sure we should log about every unit event that happens. I don't think that is broken, no.

Why? Why do I have to log successful start/stop of those 100s timer units (and their services) every second, especially if I only look into logs when something bad happens - which is, usually, once per.. well, months maybe? Anyway those noisy messages will be filtered on log review - so what is the reason to store them? If something does not work as expected we have monitoring, thus logging just for the sake of logging is a bit strange.

There are of course cases when everything should/must be logged but for sure this is not the default case, not for dev/home servers nor workstations - and in those default cases 99% of all messages produced by system are complete garbage or noise. Yes, system/distro maintainers should adjust logging levels but nevertheless - logging everything is more often useless than useful.

And yes, I genuinely believe that schemes like the three per-unit options, or the idea of adding a SECRECY_LEVEL logging field are much better solutions to the problems people report.

Again, what to do with unstructured logging? Please, don't ask us to "fix broken software" - it most cases this is out of our hands, and even syslog() levels are not properly handled by many apps.

Actually anything like per-service (or per service-log-group if one is implemented) LogFilter= (pipe/socket/process that accepts anything that would go to journald as a stream) would already help a lot.

You might ask - "why to log using journald in such case instead of redirecting to rsyslog?" - well, the answer is quite simple - what I really love with journald (and I guess many people do) is the centralization - so I still could quickly search for a log output of everything (= all processes, not all output ever produced) - instead of trying to combine/correlate many logs or installing specialized log collection services (this is not always feasible).

PS: right now, my favorite way to filter is something like "ExecStart=/usr/local/bin/custom-log-filter ... /bin/app-to-filter" - and this is not only not nice but not really flexible, does not catch other "Exec=" output, and requires to set SyslogIdentifier to get proper unit name in logs, and I have to do this for every* unit individually.

gleventhal commented 3 years ago

By not being able to filter in journald, we are forced to either abandon persistent storage for journald, abandon journald, or tolerate writing sensitive data to local disk when we don't want to do that. Because journald consumes /dev/log, it is the first stop for syslog traffic, but for shell history (which is captured on a specific syslog facility and therefore easily filterable without regex), we would like it to pass-through to the syslog daemon without being persisted by journald.

So that makes it sound like having different maximum log levels and/or different ratelimiting and/or different log stores per service is what you are looking for? doing log flow control per unit is something i am a lot more sympathetic to. It's the regex mess I am not interested in. i.e. we already have:

LogLevelMax= as a per-service setting. You can turn off logging from a chatty service by setting that to emerg, and anything below emergency log level will be suppressed.

LogRateLimitIntervalSec= + LogRateLimitBurst= also is a per-service setting: you could lower the allowed rate so much that basically nothing goes through anymore.

LogNamespace= allows you to open independent logging namespaces which you can group services into. You could have one namespace with long retention, and another with short retention, and then assign your services to each. (each logging namespace gets its own journald instance with its own config file).

Neither of these approaches needs filtering by regex or something like that. Instead it provides log controls keyed by "trusted" properties of the log stream, i.e. unit membership of a process.

I mean the problem with mangling the log stream by regex is that it tries to reconstruct context state where it was dropped before. Our approach is generally that context should not be reconstructed, but passed along in a safe, secure way, and that's how the per-unit knobs work. to give a more specific example: if you say "filter out every log line starting with the word bash", then that's a terrible rule, since any program can do strcpy(argv[0], "bash") and then evade logging.

@poettering Log namespaces sound really great, though I am not sure they can solve this use case: it's not a specific service, in this case it's the bash executable making syslog(3) API calls in order to send bash history events to syslog. They all are sent to a unique syslog facility which could be used for filtering. I just want to skip the journal local storage and go right to syslog with these messages. If there is a way to configure journal namespaces based on syslog facility or the calling process name (when it isn't a daemon or service) then this would solve my problem.

h0tw1r3 commented 3 years ago

You know, you make it sound as if systemd takes your freedom, you liberty away, to configure things the way you want. It doesn't though: you can always use rsyslog on top of it, it does support regex filtering after all.

Doesn't it though?

I can't use rsyslog on top because the journal is unable to keep up with the system log volume. Even if the journal could keep up, it's a terrible waste of resources at scale.
I would prefer to keep journald because it's tightly coupled with the system, particularly boot-up, but I can't... see point 1.

Arguing my first point is madness. Every use case is different. In my case, I must log at an obscene rate for regulatory compliance purposes. I can't change hearts or minds or code, it's just the world we live in.

The second point, the only way I've found to work-around the journal is an LD_PRELOAD (please correct me if I'm wrong). Could I write a patch for some programs? possibly... but that's also assuming they are open-source (many are not), open to accepting a patch, or the user has the experience to write said patch.

This issue seems to have morphed from filtering to mangling. Perhaps a compromise could be to extend the per-unit syslog support with the flexibility to pipe through a filter process (not provided or supported by systemd) or ship off to another log collector and bypass the journal completely.

yogo1212 commented 3 years ago

@h0tw1r3 for processes logging from stdout/stderr, i found adding log massaging utilities like grep to ExecStart to be very useful. sadly, it doesn't work for logs from mounting/unmounting :-(

paddy-hack commented 3 years ago

As per the linked issue above, Docker healthchecks trigger short-lived temporary mounts. These are logged by systemd per systemd.mount(5):

Mount points created at runtime (independently of unit files or /etc/fstab) will be monitored by systemd and appear like any other mount unit in systemd.

On a system with a single healthcheck every minute, the journal shows something like

Mar 01 11:43:34 easy systemd[1]: run-docker-runtime\x2drunc-moby-fcaa2e56ba41bce77e0accee1840bb517b68eae984f31a1e9ebd6335fe1cf379-runc.NaMXcA.mount: Succeeded.
Mar 01 11:46:35 easy systemd[1]: run-docker-runtime\x2drunc-moby-fcaa2e56ba41bce77e0accee1840bb517b68eae984f31a1e9ebd6335fe1cf379-runc.SKTgzS.mount: Succeeded.
Mar 01 11:47:35 easy systemd[1]: run-docker-runtime\x2drunc-moby-fcaa2e56ba41bce77e0accee1840bb517b68eae984f31a1e9ebd6335fe1cf379-runc.fubVhD.mount: Succeeded.
Mar 01 11:49:35 easy systemd[1]: run-docker-runtime\x2drunc-moby-fcaa2e56ba41bce77e0accee1840bb517b68eae984f31a1e9ebd6335fe1cf379-runc.dKR3pj.mount: Succeeded.
Mar 01 11:50:36 easy systemd[1]: run-docker-runtime\x2drunc-moby-fcaa2e56ba41bce77e0accee1840bb517b68eae984f31a1e9ebd6335fe1cf379-runc.VPNpv9.mount: Succeeded.
Mar 01 11:51:36 easy systemd[1]: run-docker-runtime\x2drunc-moby-fcaa2e56ba41bce77e0accee1840bb517b68eae984f31a1e9ebd6335fe1cf379-runc.DPBHUW.mount: Succeeded.
Mar 01 11:53:36 easy systemd[1]: run-docker-runtime\x2drunc-moby-fcaa2e56ba41bce77e0accee1840bb517b68eae984f31a1e9ebd6335fe1cf379-runc.cxsDBy.mount: Succeeded.

# BTW, note that the journal is missing entries and hence cannot be relied upon to give a complete picture of what's going on :disappointed: Unless the healthchecks themselves are occasionally skipped ... :thinking:

I do not want these successful mount event entries to be kept in memory or saved to disk on systems that run many containers with healthchecks that trigger often to preserve system resources. I have scoured several of the many systemd-releated manual pages but have not been able to find a way to achieve this. This enhancement request appears to offer a solution.

Note that in this case, the "broken" application that needs fixing would be (a subcomponent of) systemd as that is what's monitoring these temporary mounts.

hlovdal commented 3 years ago

@poettering There is nothing in what you have written that explains why filtering at collection time cannot be done. The only reason you have given is that "you don't like it".

Take for instance

Our approach is generally that context should not be reconstructed, but passed along in a safe, secure way, and that's how the per-unit knobs work. to give a more specific example: if you say "filter out every log line starting with the word bash", then that's a terrible rule, since any program can do strcpy(argv[0], "bash") and then evade logging.

You litterally described the solution to that right before describing the problem! Pass along the /proc/.../exe value in the filtering context. Rsyslog has filtering syntax like if $msg contains "..." then { ... }. Nothing stops systemd from implementing if $exe equals "/bin/bash" and ... (assuming same syntax here for simplicity).

Can you please fill in the following table with suitable entries so that everyone can get a better understanding of the alleged obstacles?

What is the problem(s)	What is the cause(s)	What is the solution(s)
???	???	Not supporting filtering at at collection time.

adrian-the-git commented 3 years ago

@hlovdal we've seen that @poettering recoils at this idea, and I think markdown-formatted analysis matrices will fail to elicit a response. Let's begin to examine why the idea is being rejected from an objective standpoint, and see who among us will come forward with a robust solution.

dnicolodi commented 3 years ago

I am going to state the obvious: log filtering can be the selling point for an alternative implementation of systemd-journald. There isn't any technical constraint I am aware of that makes this solution nonviable. The interfaces that need to be implemented are all documented and the source code of systemd-journald is available for anyone to hack on it.

I don't think anyone has the right to compel any of the systemd contributors into implementing anything. The merit or desirability of the feature is secondary in this respect (well... unless a support contract of some sort is in place, but then I am quite sure this issue would not be the right place to request the feature as part of paid support work).

I recommend the proponents of this feature to scratch their own itch and implement it: this is how free software works. If the feature is important and desirable their implementation will rapidly become the de-facto standard one.

aldem commented 3 years ago

I recommend the proponents of this feature to scratch their own itch and implement it

@dnicolodi The problem is not in the implementation - unless this feature will be accepted (once implemented) upstream - and this is where we don't see any willingness to accept. Without being integrated upstream (and thus not coming into any distro) it wouldn't scale well.

... this is how free software works.

This could be true if systemd would be a completely optional piece of software that is absolutely not required for anything and installation is purely optional. Unfortunately, nowadays systemd is an integral part of several major distributions and could not be easily disabled or uninstalled - this makes it a bit more than just "free software".

If the feature is important and desirable their implementation will rapidly become the de-facto standard one.

Not necessarily. Most people don't care and those who do care usually don't have the power to push it. There are also people who are simply not aware that it could be done different (and better) way and they will never find this discussion so they will never cast their votes to support.

dnicolodi commented 3 years ago

@aldem You missed the point: if you need this, go and implement it in a different codebase. There isn't anything preventing you from swapping out the journald implementation shipped with systemd with something else implementing the same interfaces. While its interfaces are simpler, many implementations of syslog exist, I don't see why the same cannot be true for journald.

aldem commented 3 years ago

I don't see why the same cannot be true for journald

@dnicolodi It is tightly integrated with everything else, thus if something changes - it has to be tracked and followed. While I (and many others) want filtering options and it would be quite easy to implement them once in upstream, it could become a support nightmare if it will be alternate systemd-journald as you have to provide packages (for all involved distros), proper hooks on updates, make sure that nothing breaks etc.

gleventhal commented 3 years ago

If anyone wants to write a filtering proxy, or even a patch for the journal that is hopefully modular enough to apply cleanly/integrate with upstream changes without much churn, I am willing to help design and work on that with them.

I'm thinking it could be a PreExec in the systemd-journald.service unit and it will grab /dev/log and proxy in front of the journal, and filter out some data or send the data directly to the syslog daemon's listening socket based on a config.

Otherwise we could just write similar functionality into the journald daemon itself. Perhaps we can use https://www.freedesktop.org/software/systemd/man/systemd-socket-proxyd.html

poettering commented 3 years ago

If anyone wants to work on the SECRECY_LEVEL thing or the per-unit log level cutoff stuff, happy to review/merge patches.

If you want a "filtering proxy" for stdout/stderr logs, may I suggest using "grep"?

poettering commented 3 years ago

I'm thinking it could be a PreExec in the systemd-journald.service unit and it will grab /dev/log and proxy in front of the journal, and filter out some data or send the data directly to the syslog daemon's listening socket based on a config.

If you want to redirect /dev/log for a specific service somewhere else you can just use BindPaths=/dev/log:/somewhere/else. This will run your service in a mount namespace with /dev/log overmounted.

But of course, logs come via stdout/stderr + syslog + journald native logging.

poettering commented 3 years ago

As per the linked issue above, Docker healthchecks trigger short-lived temporary mounts.

This looks like something to fix in Docker though. Container mounts should not appear on the host. That doesn't scale unfortunately, and just appears to be sloppiness on their side. Given that Docker is supposedly open source, I'd recommend talking to them, maybe they'll fix it.

aldem commented 3 years ago

If you want a "filtering proxy" for stdout/stderr logs, may I suggest using "grep"?

That would be wonderful - as soon as we have per unit/namespace (or even global) option LogProxy=/usr/bin/grep ...

poettering commented 3 years ago

@poettering There is nothing in what you have written that explains why filtering at collection time cannot be done. The only reason you have given is that "you don't like it".

As I have written before: I am fine with doing structured filtering based on trusted fields, from structured data that precisely reflects system state and cannot be faked by programs. Things like providing high-level knobs to filter by security level, to have per object log levels and so on. Your idea of regexes, regexes, regexes and more regexes on human language is just broken though, since human language sucks for doing secure filtering. You shouldn't throw away context and that try to reconstruct it via regexes. If you do that, you are hacking around stuff. But if you do that, then you might as well just do a shell script, i.e. there's no reason to solve this in systemd.

I mean I understand that the service payload you have is a black box for you and you cannot fix it. You are looking for ways to tape around this, and that's very understandable to me. But for that kind of stuff there's shell scripts which allow you to scrape, mangle, process output as much as you want, with tools like grep, with regexes, with "cut", with "awk", with "perl", you have the full freedom. But I am not sure why it's so important to you that systemd does those hacks for you...

poettering commented 3 years ago

That would be wonderful - as soon as we have per unit/namespace (or even global) option LogProxy=/usr/bin/grep ...

There's no chance this'll be merged upstream. Sorry.

I really don't get why these hacks are supposed to live in systemd. systemd is not supposed to be the home for hacks, but where we actually think a bit about things and try to find clean solutions, while still providing you with enough tools to hack your own solution if you like.

If you want scrape stdout/stderr, you have three options:

replace your ExecStart= with a shell script, that calls a pipeline with grep, awk, perl of your choice
use StandardOutput=file:… and StandardError=file:… with a path referring to an AF_UNIX stream listening socket. at service activation time systemd will connect to those sockets and use them for stdout/stderr
Use BindsPaths=/run/systemd/journal/stream:/some/where/else

If you want to redirect the native or syslog sockets you can do BindsPath=/run/systemd/journal/dev-log:… and BindsPath=/run/systemd/journal/socket:… to some other place.

All BindsPath= options mean your service will run in their own mount namespace.

Note that JournalNamespace= doesn't do much else that BindsPaths= either, to redirect all logging elsewhere

aldem commented 3 years ago

I really don't get why these hacks are supposed to live in systemd.

To avoid storing noise (sometimes unavoidable) when storage resources are limited or not durable enough (SD/NAND). This was explained many, many times above - we don't have control what to log on all existing software, even in systemd itself you could not turn off all logging but only for specific unit/timer/service/etc.

But even if you have tons of storage - there is a difference - to grep 1T of data (where 99.9% is useless) or grep 1M of data.

replace your ExecStart= with a shell script, that calls a pipeline with grep, awk, perl of your choice

This wouldn't work if you want to do it for all services (or group of them) - thus forcing you to rewrite each and every unit configuration, which is in turn requires you to be aware of service internals before doing so.

use StandardOutput=file:… and StandardError=file:… with a path referring to an AF_UNIX stream listening socket.

Better, but still not easy to apply to group of units simultaneously.

The problem is - in many cases people don't need to store most of what is logged (desktops & co and lots of other cases) - but by default everything is logged and is not so easy to turn off, as you have to do it in many places (if doable at all) - while one place would solve most of those issues.

aldem commented 3 years ago

As I have written before: I am fine with doing structured filtering based on trusted fields, from structured data that precisely reflects system state and cannot be faked by programs.

This sounds reasonable... as soon as all software in the world is using structured logging. But we are in the real world, and this is not going to happen soon enough. Even then, regex filtering on content (not metadata) would be still helpful.

But as long as there is no way to get rid of (useless) messages like:

Mar 11 04:39:05 ws systemd[1]: Starting Clean php session files...
Mar 11 04:39:05 ws systemd[1]: phpsessionclean.service: Succeeded.
Mar 11 04:39:05 ws systemd[1]: Finished Clean php session files.

unstructured regex filtering would solve the problem.

poettering commented 3 years ago

This sounds reasonable... as soon as all software in the world is using structured logging. But we are in the real world, and this is not going to happen soon enough. Even then, regex filtering on content (not metadata) would be still helpful.

Hmm? A major chunk of the metadata we store is acquired automatically, independently of what apps actually send us precisely. And syslog logging gives us a couple of fields on top, such as priority level.

poettering commented 3 years ago

But as long as there is no way to get rid of (useless) messages like:
Mar 11 04:39:05 ws systemd[1]: Starting Clean php session files...
Mar 11 04:39:05 ws systemd[1]: phpsessionclean.service: Succeeded.
Mar 11 04:39:05 ws systemd[1]: Finished Clean php session files.
unstructured regex filtering would solve the problem.

I already mentioned above that I'd be happy to merge a patch that makes log levels in systemd object-bound (in this case unit bound). i.e. that you can specify a log level cutoff in unit files that then applies to all logs from the unit as well as about the unit. That way you can effectively "ghost" a unit, i.e. run it in a way that there's no trace whatsoever of it. Not sure actually doing so is a good idea, but you appear to think that hiding logs about unit was a good idea. And this would be a mechanism I'd be OK with, just send a patch.

hlovdal commented 3 years ago

@poettering Why on earth are you talking about log levels? Log levels do not at all solve the issue that everyone that requests this feature is looking for.

Maybe my imagination is limited, but the only reason I can think of mentioning log levels is if you do not really understand what the problem is. Or am I wrong? Can you please describe exactly how log levels would solve the problem of ignoring just the following lines?

Mar 11 04:39:05 ws systemd[1]: Starting Clean php session files...
Mar 11 04:39:05 ws systemd[1]: phpsessionclean.service: Succeeded.
Mar 11 04:39:05 ws systemd[1]: Finished Clean php session files.

If you think regexps are evil, notice that here in this case a full string match is sufficient. Having "only" that would be a big step forward.

duritong commented 3 years ago

If you filter all log levels (cutoff highest level) of a unit, all your logs of that unit won't show up anymore. Much simpler and much more sustainable than full string matching.

aog2000a commented 3 years ago

If you filter all log levels (cutoff highest level) of a unit, all your logs of that unit won't show up anymore. Much simpler and much more sustainable than full string matching.

Well... that's a bit too extreme, don't you think? Usually one wants to suppress a single/few messages overflowing the logs, but not "fully bury" the unit/service into oblivion... Log level filtering could help if the offending msg is level info though, but suppressing warnings & errors?

hlovdal commented 3 years ago

If you filter all log levels (cutoff highest level) of a unit, all your logs of that unit won't show up anymore. Much simpler and much more sustainable than full string matching.

- I want to remove some messages. - You can remove everything, then those some messages will be gone.

Do you recognize how absurd that sounds? It is almost a non sequitur axe handle reply.

yogo1212 commented 3 years ago

@poettering i also recommend the ExecStart=.. | grep variant. the use of BindPaths is clever. but i also think the unaddressed mount logs are a real cause of pain. systemd[1]: run-docker-runtime\x2drunc-moby-5807e1830a641c0ac13cb64f70be6eb12f6839a54aebc15fd0f3cc3883340c67-runc.0F9spO.mount: Succeeded

i'd say they should be filtered before they clog up the hard drive and drain SSD write cycles. and so far, i haven't seen a solution for suppressing them. sure, they could be mounted in a different mount namespace but any mechanism for sharing the mount with the main namespace will produce the log line.

glensc commented 3 years ago

... and so far, I haven't seen a solution for suppressing them.

@yogo1212 solution to suppress the docker noise is to disable health checks:

https://github.com/docker/for-linux/issues/679#issuecomment-787676154

If you have nothing consuming the healthchecks data, it's the way to go.

yogo1212 commented 3 years ago

@glensc i appreciate the recommendation. i used docker only as an example and wasn't aware of the workaround. my personal beef is with ad-hoc user mounts of file systems containing sensitive data. the file systems are encrypted and the keys shouldn't stay in memory for too long. mounts should end with the session.

i want to recommend a switch to journald away from rsyslog because it will be a lot easier to work with. but with journald, we'd have to effectively double our log storage capacity to store logs for an identical time span (or immediately export the log from journald to some other place which is not what we want).

hlovdal commented 3 years ago

@glensc That is not a "solution" in any way. In the very best case it is a horrible hack, and at worst it fundamentally breaks reliability.

glensc commented 3 years ago

@hlovdal if you only bothered to read the whole message and understand that it was a response to a specific SSD write cycles problem. You can "Discuss" in the link given in the post, but hey, it's already discussed there exactly the same concerns.

hlovdal commented 3 years ago

You did include the disclaimer "If you have nothing consuming the healthchecks data", and I did read the linked issue which also basically says the same thing, so I do not dispute that it can work as a workaround.

But calling it a solution is misleading. In the best case it is a workaround. Solutions and workarounds are two very different things.

paddy-hack commented 3 years ago

If you have nothing consuming the healthchecks data, it's the way to go.

I do have stuff consuming healthchecks, in particular the ones I added. I've got no way to go, so now what? Get Docker fixed? What about the next program that does something that systemd monitors and "spams" the logs with? Switch to a non-systemd distro?

Shouldn't there be some way that allows the system administrator more granular control over what gets logged by systemd itself? And I'm not just concerned about logging sensitive data, I'm concerned about "spamming" the log. In my particular case of "spamming" triggered by Docker's temporary mounts for healthchecks, I'd like those to never even hit the logs but I do want to be able to check for mount events triggered by USB sticks getting plugged in or CDs/DVDs mounted. That probably means that log-levels don't cut it either.

christianTF commented 3 years ago

I agree that there must be some kind of filter on service level.

An example: I use influx with telegraf and import data (around 500.000 data per minute). Every record is sent as one line to stderr by telegraf. It takes about 5 minutes to have a daemon.log with around 500 MB. Logrotate is not fast enough to get it away, it fills up the disk after several minutes.

I would require some journal.d/my.conf file to disable this stderr logging at all for telegraf.

Currently, I completely shut down the rsyslog service.

mrc0mmand commented 3 years ago

I would require some journal.d/my.conf file to disable this stderr logging at all for telegraf.

In this specific case creating a dropin with StandardError=null for the telegraf service should do exactly what you need, see systemd.exec(5).

christianTF commented 3 years ago

drop in - thanks for the word!

hifron commented 3 years ago

Right thing to do may differ at various enterprises how to solve(and manage things) or delegate to. Also CURRENT operating systems differ a lot with this issue and may require additional system thinking.

Jaywann commented 3 years ago

Since the core topic of this problem is not the filterng of arbitrary messages but the spam of repeated log messages, may I suggest solving this as past logging daemons have. By implementing an option to journald that defines a timeframe in which repeated messages are collected and summarized only as:

Message: "run-docker-runtime\x2drunc-moby-340-runc.0F9spO.mount: Succeeded" has been logged 87 times in the last x minutes.

This would prevent hiding of messages as well as preserve the distinction of journald vs logging daemons with filtering like syslog-ng or rsyslog.

SwitchTV-BenBettridge commented 3 years ago

Since the core topic of this problem is not the filterng of arbitrary messages but the spam of repeated log messages, may I suggest solving this as past logging daemons have. By implementing an option to journald that defines a timeframe in which repeated messages are collected and summarized only as:

Message: "run-docker-runtime\x2drunc-moby-340-runc.0F9spO.mount: Succeeded" has been logged 87 times in the last x minutes.

This would prevent hiding of messages as well as preserve the distinction of journald vs logging daemons with filtering like syslog-ng or rsyslog.

This doesn't meet the case where the message is slightly different each time, due to timestamps, the inclusion of the child PID, etc.

Jaywann commented 3 years ago

This doesn't meet the case where the message is slightly different each time, due to timestamps, the inclusion of the child PID, etc.

Those informations have always been part of the meta information about a message. Even back with the logger command is was always best practice to use the approprirate flags to supply pid, etc and journald treats them as separate as well (see systemd.journal-fields). If a service logs those infomation inside the actual message the API needs to be used corrctly by the service, not by journald adapting to the misuse. This however is not the case with any of the messages discussed in this issue.

SwitchTV-BenBettridge commented 3 years ago

This however is not the case with any of the messages discussed in this issue.

That just simply isn't true and I find it difficult to understand why you'd even make the claim when a cursory scroll upwards shows this. Example #1, #2 - and those are just from the last 3 months, this ticket has been open for years. Heck, even the militant opposition has suggested using grep to filter the output - which while unwieldy and hard to manage at least partially achieves the desired effect.

systemd / systemd