Closed bbockelm closed 4 months ago
Another source of these - XrdPfc
when the client debug is set to 4.
Could you tell us how you actually apply the filter? That is, how is it implemented.
Talked to @abh3 in-person over the coffee break; just wanted to record the highlights:
Xrootd_Protocol: 0000 open rt
.Ah, too much bravado on my part. I forgot about this ticket. Thank you for reminding me. This will be fixed. It would be helpful for the PFC problem if you post the the log line (minus token) that exposes the token so I can search for it.
I think this regexp is not enough because one can use other types of token to authenticate (EOS token starting with zteos64:
or macaroons for example). Is the aim of this issue to just hide SciTokens from the logs, or should we hide everything that is after Authorization: Bearer
and what is after authz=
?
Well, I am not in favor of a brute force solution. Adding a real-time regexp filter for every single log line is really an inefficient approach. Plus, while it adds a small back pressure to the logger, it can snowball into significant latency to every single code path that writes to the log when logging is relatively heavy. I think we need to fix this the right way with a surgical approach at just the right time. I don't see that as being difficult it just takes a bit more time.
Yes sure, I was just questioning the regexp itself, shall we hide every token where we see it is needed or should we only hide scitokens? I'm in favor of hiding every token from the logs, but wanted your opinion on that ;)
I think any authorization information should never spill into the log. Auth info here includes tokens as well as anything else that appears on the authorization header. That just means we isolate the auth info and prevent it from inadvertently being exposed. I think I nailed it for everything but Xcache and the proxy server where it can be a problem. We also have XrdCl (the client)that can dump a url with auth info on it as well. I think http only has a couple of places where a token may be printed, again, because it happens to be in the URL We just tackle these one by one.
Sure then, I think I know where these tokens could leak ! Thanks for the precision, was confused by the regexp ;)
I'm in favor of hiding every token from the logs, but wanted your opinion on that
The regex was carefully tuned to catch a JWT: I can't imagine any case where you'd want to print out the private portion of a JWT, regardless of its use case.
@abh3 - per your request about what comes out via the PFC, here's an example (I set the token to be the fictitious foo1234
):
eduroam-0197:build bbockelm$ XRD_PELICANCERTFILE=~/.config/pelican/certificates/tlsca.pem XRD_PLUGINCONFDIR=~/projects/xrdcl-curl/build/release_dir/etc/xrootd/client.plugins.d/ ~/projects/xrootd/build/release_dir/bin/xrootd -c ~/.config/xrootd/xrootd-bbockelm-cache.cfg 2>&1 | grep foo1234
240516 17:14:56 5379 http_Protocol: rc:24 got hdr line: Authorization: foo1234
240516 17:14:56 5379 unknown.1:31@localhost http_Req: Appended header fields to opaque info: 'authz=foo1234&pelican.timeout=3s'
[2024-05-16 17:14:56.807137 +0200][Debug ][ExDbgMsg ] [p0@eduroam-0197.desy.de:1095] MsgHandler created: 0x52e099c0 (message: kXR_stat (path: /tmp/xrootd/public/foo?authz=foo1234&pelican.timeout=3s, flags: none) ).
[2024-05-16 17:14:56.811735 +0200][Error ][XRootD ] [p0@eduroam-0197.desy.de:1095] Impossible to send message kXR_stat (path: /tmp/xrootd/public/foo?authz=foo1234&pelican.timeout=3s, flags: none). Trying to recover.
[2024-05-16 17:14:56.811746 +0200][Debug ][XRootD ] [p0@eduroam-0197.desy.de:1095] Handling error while processing kXR_stat (path: /tmp/xrootd/public/foo?authz=foo1234&pelican.timeout=3s, flags: none): [FATAL] Login failed.
[2024-05-16 17:14:56.811755 +0200][Debug ][ExDbgMsg ] [p0@eduroam-0197.desy.de:1095] Passing to the thread-pool MsgHandler: 0x52e099c0 (message: kXR_stat (path: /tmp/xrootd/public/foo?authz=foo1234&pelican.timeout=3s, flags: none) ).
[2024-05-16 17:14:56.811777 +0200][Debug ][ExDbgMsg ] [p0@eduroam-0197.desy.de:1095] Calling MsgHandler: 0x52e099c0 (message: kXR_stat (path: /tmp/xrootd/public/foo?authz=foo1234&pelican.timeout=3s, flags: none) ) with status: [FATAL] Login failed.
240516 17:14:56 5379 unknown.1:31@localhost Xrootd_Protocol: 0000 open rt /tmp/xrootd/public/foo?authz=foo1234&pelican.timeout=3s
240516 17:14:56 5379 XrdPfc_Cache: info Attach() root://u31@eduroam-0197.desy.de:1095//tmp/xrootd/public/foo?authz=foo1234&pelican.timeout=3s
[2024-05-16 17:14:56.812103 +0200][Debug ][File ] [0x5400a1a8@root://u31@eduroam-0197.desy.de:1095//tmp/xrootd/public/foo?authz=foo1234&pelican.timeout=3s&xrdcl.requuid=52BB5289-D4F8-420B-9EE9-5CB90FB38388] Sending an open command
[2024-05-16 17:14:56.812156 +0200][Debug ][ExDbgMsg ] [u31@eduroam-0197.desy.de:1095] MsgHandler created: 0x5400a320 (message: kXR_open (file: /tmp/xrootd/public/foo?authz=foo1234&pelican.timeout=3s, mode: 00, flags: kXR_open_read kXR_async kXR_retstat ) ).
[2024-05-16 17:14:56.817051 +0200][Error ][XRootD ] [u31@eduroam-0197.desy.de:1095] Impossible to send message kXR_open (file: /tmp/xrootd/public/foo?authz=foo1234&pelican.timeout=3s, mode: 00, flags: kXR_open_read kXR_async kXR_retstat ). Trying to recover.
[2024-05-16 17:14:56.817059 +0200][Debug ][XRootD ] [u31@eduroam-0197.desy.de:1095] Handling error while processing kXR_open (file: /tmp/xrootd/public/foo?authz=foo1234&pelican.timeout=3s, mode: 00, flags: kXR_open_read kXR_async kXR_retstat ): [FATAL] Login failed.
[2024-05-16 17:14:56.817066 +0200][Debug ][ExDbgMsg ] [u31@eduroam-0197.desy.de:1095] Passing to the thread-pool MsgHandler: 0x5400a320 (message: kXR_open (file: /tmp/xrootd/public/foo?authz=foo1234&pelican.timeout=3s, mode: 00, flags: kXR_open_read kXR_async kXR_retstat ) ).
[2024-05-16 17:14:56.817083 +0200][Debug ][ExDbgMsg ] [u31@eduroam-0197.desy.de:1095] Calling MsgHandler: 0x5400a320 (message: kXR_open (file: /tmp/xrootd/public/foo?authz=foo1234&pelican.timeout=3s, mode: 00, flags: kXR_open_read kXR_async kXR_retstat ) ) with status: [FATAL] Login failed.
[2024-05-16 17:14:56.817150 +0200][Debug ][File ] [0x5400a1a8@root://u31@eduroam-0197.desy.de:1095//tmp/xrootd/public/foo?authz=foo1234&pelican.timeout=3s&xrdcl.requuid=52BB5289-D4F8-420B-9EE9-5CB90FB38388] Open has returned with status [FATAL] Login failed: Got invalid login response.
[2024-05-16 17:14:56.817172 +0200][Debug ][File ] [0x5400a1a8@root://u31@eduroam-0197.desy.de:1095//tmp/xrootd/public/foo?authz=foo1234&pelican.timeout=3s&xrdcl.requuid=52BB5289-D4F8-420B-9EE9-5CB90FB38388] Error while opening at u31@eduroam-0197.desy.de:1095: [FATAL] Login failed: Got invalid login response.
240516 17:14:56 5379 XrdPfc_IO: info initCachedStat successfully read size 0 and creation time 1715872496 from info file root://u31@eduroam-0197.desy.de:1095//tmp/xrootd/public/foo?authz=foo1234&pelican.timeout=3s
(the above is an example failure case -- but you can see the URL scrubbing is going to take a bit)
Closing, fixed by #2284.
@amadio - I was taking a look at #2284 and noticed that it solves two of the log lines I quoted. However, it doesn't solve the other 16. So, it might make sense to re-open.
@ccaffy - one potential improvement would be, for JWTs, only obfuscating the signature and not the public contents. The public contents are often quite useful for debugging. However, this isn't 100% necessary -- just a nicety.
@bbockelm I don't mind reopening, but since this is in the release notes for 5.7.0, it might be better to open a new ticket linking to this one as only partially solved.
I let you and @ccaffy to choose what's more appropriate. Cheers,
Oh, great point! Indeed, let's not reopen across ticket boundaries. Will create a new one.
@abh3 previously requested that I drop an issue if I see a token leaking to the logfiles. Doing some token work this morning and decided to poke around.
The token in this message is from demo.scitokens.org and hence not a "real" token.
Below are randomly-selected log snippets from two different test servers:
In Pelican, we filter all xrootd logging lines through the following regexp:
and replace matches with
"$prefix$header.$payload.REDACTED"
I suspect a similar filter could be done here (either passing through all log lines or target cases like those above).