Sanitizing logs - Githubissues

aantron commented 3 years ago

When running with log level Debug, Caqti writes entire queries and their parameters into the log. This can be a security issue, as some of the parameters can be secrets. See https://github.com/aantron/dream/issues/44, where Caqti is shown logging session ids.

Is there a good way to address this, besides not setting Caqti's log level to Debug? I suspect the answer is no, but I wanted to ping here for discussion/ideas.

paurkedal commented 3 years ago

I can see this can be a problem.

An option which is possible in the current version is that the application clamps the Caqti log level to at most info when configuring the logger, and provide an independent setting to override if needed. The log source is Caqti_common_priv.request_log_src. As the module name suggests, it is currently meant to be private, but I think it should be moved to a public module to allow the application to configure it.

I'm pondering on what the best option may be for selectively disabling printing of parameters (or the whole log lines). This could be per-connection, per-Caqti_request.t, or even tagging parts of the parameters as secret. The latter two seems most attractive if we wish to utilize the developer's knowledge of which parts of the DB contains potential secrets.

paurkedal commented 3 years ago

I have also been thinking that it could be useful to have different log sources for requests coming from different libraries or different part of an application. That could motivate passing the log source to Caqti_request.create etc. as an optional parameter, which if not passed disables request logging.

aantron commented 3 years ago

Concerning getting the log source, it's ultimately probably best done by name rather than by identifier. So, it's probably fine if the value stays private.

I think, in general, it would be difficult to control (or even expect) that Caqti will log secrets, because most applications aren't routinely tested at level Debug, and higher-level libraries will inevitably introduce new kinds of secrets.

Maybe the best way is to not print query parameters by default at all, at level Debug, and only print them if some kind of opt-in flag is set. I think any solution that makes this opt-out will eventually trigger security issues that are difficult for users to foresee.

paurkedal commented 3 years ago

I agree opt-in is the way to go if we want to be sure the log is safe to publish.

My second post about only logging request if a log source is explicitly passed to Caqti_request.create would qualify as an opt-in. Though I can imaging it being passed indiscriminately if the developer is not aware of the issue. On the other hand it will not be possible for the user to enable request logging if the developer didn't opt in.

I can see Caqti already uses CAQTI_POOL_MAX_SIZE and CAQTI_DEBUG_DYNLOAD, so I think an environment variable to enable logging of parameters globally may be the way to go. And would not require changes to the API.

aantron commented 3 years ago

I personally would vote against an environment variable, especially one that can affect security. I'd provide some val that can be called or set, and the app author can choose if they want to use that val based on an environment variable — or based on other condtions, according to their app.

The more hardcoded environment variables there are in libraries, the less "functional" and "pure" the whole app is on the app scale, the more stress there is on the user's part. Especially in this case, where using an environment variable would be part of a potential attack vector.

aantron commented 3 years ago

I'd even suggest, admittedly without knowing the needs of other Caqti users, to just implement logging without parameter values unconditionally for now, and only add an opt-in way to show them upon some future request.

paurkedal commented 3 years ago

I'd even suggest, admittedly without knowing the needs of other Caqti users, to just implement logging without parameter values unconditionally for now, and only add an opt-in way to show them upon some future request.

It was partly motivated by a request, though I already wished to have this functionality for debugging, as I found the query string to often be of limited use. E.g. I've also found it very useful to keep a tail of the debug log on the side of the browser while triggering an issue in a web application.

I personally would vote against an environment variable, especially one that can affect security.

On the contrary, the system and system administrator must ensure that the environment cannot be affected by untrusted sources (except in a controlled manner). As the most apparent examples, $LD_LIBRARY_PATH and other variables affect from where ld.so loads libraries, and if the application runs a shell, $PATH will affect where executables are loaded. The $HOME variable may be involved in determining application data files. Some system libraries are also already configured via environment variables. Both ssh and sudo take care to only let though selected environment variables when changing context.

The more hardcoded environment variables there are in libraries, the less "functional" and "pure" the whole app is on the app scale, the more stress there is on the user's part. Especially in this case, where using an environment variable would be part of a potential attack vector.

If it is an attack vector, then the application environment is already compromising the application in worse ways then leaking information into logs, which usually need to be protected anyway in a production environment, since it often logs IP numbers, user names, or other personally identifiable information.

I can agree that environment variables are not the most elegant solution. It would be nice to have a unified way of configuring OCaml applications, say you pass a file or directory upon start, then the application will call the configuration system which loads the configuration, splits it up and delivers the pieces to different libraries or components. Something like than would be nice, if thought through more thoroughly. But without a unified configuration system, it will be up to the application to configure the libraries it use, and all transitive dependencies. So that doesn't scale.

Environment variables solves that. It's not an elegant solution, but it works. And both systemd services and traditional init scripts supports passing them well (Ubuntu: /etc/default, EL: /etc/sysconfig). The variables are passed on by whatever system starts the application, then treated as read-only, like other configuration variables.

aantron commented 3 years ago

There is fundamentally no need for this environment variable over a global val. There is no need to implement a questionable configuration system that can't be readily removed or disabled later, only because a good one is missing — a good one may be developed. Why should an upstream library force configuration conventions on its far-downstream users, which might be running in very different environments (in general; I'm not talking about environment variable environments)? Also, there is no need to skimp on defense in depth, or slowdown of attacks, however fleeting. Leaking personally identifiable information compromises privacy and can lead to other attacks indirectly. That's not the problem in this issue anyway. This issue is about leaking secrets, which directly compromise security (rather than privacy), and immediately allow impersonation of users. Yes, they both need to be protected, but client IPs and session secrets are not in the same class and the severity of leaking them is not the same.

paurkedal commented 3 years ago

Using environment variables is not inventing something new, but using what is at hands. I don't consider implementing a configuration system, but to point out the absence of an alternative to environment variables if we don't count on the application to configure every piece of every library. An important point though, is that this is an option which is not observable to the application, apart from the logger which shouldn't care.

I completely agree secrets should not be logged. The bottom line is that who guarantees the security of the running application must ensure the environment only comes from trusted sources. There is no room for error on this point.

paurkedal commented 3 years ago

I did two commits which I think solves this issue:

There is a new redact annotation to be applied to parameter types which may contain sensitive information. The pretty-printer used for logging will omit values under this annotation.
Printing of parameters is disabled by default, but can be enabled by setting an environment variable. As stated above, I can't see how the application can be secured unless a trusted person or system is in control of the environment.

aantron commented 3 years ago

Thanks!

paurkedal / ocaml-caqti

Sanitizing logs #70