Sanitise error messages and introduce ErrId markers

mstruk commented 3 years ago

Reduce the amount of details sent to the clients through error messages. Log full stacktraces on the broker when DEBUG logging is turned on for server callback handler. Accompany the client errors with an ErrId marker - a randomly generated 8-char random hex string to allow for easy pairing of broker-side error logs to the client errors.

Note that when using PLAIN mechanism the errors sent to the client are beyond control of the server callback handler, and thus can't include any custom error message (including an ErrId marker).

mstruk commented 3 years ago

I'd like a little more background on what actual problem(s) this solves:

Was the amount of info returned to the client too much?

The OAuthBearerSaslServer and the PlainSaslServer provided by Apache Kafka, which our server callback handlers plug into, behave slightly differently in terms of propagating error messages. When PLAIN is used we can't control the error message at all, we can't send the client any additional information. When OAUTHBEARER is used the client receives the result of e.getMessage() of the thrown exception. In order to pack enough information we drilled down the exception cause chain and concatenated the e.toString() of each cause into a message that was then returned by the 'top-of-the-chain' exception that the OAuthBearerSaslServer received. In practice that returned a lot of nested classnames with redundant repetitions, and internal details (keycloak library exception class names for example ...). Looking at it again I figured the informational value of most of that was low.

Or were we leaking security sensitive stuff which we shouldn't (if so, what do we consider security sensitive)?

Not to my knowledge. But it's possible that in some situation there might be some internal details released - not secrets, but implementation details that don't really concern the client, and might possibly facilitate further probing when looking for any way to gain unauthorized access (pure guess on my part - but unless you control a finite set of possible errors, and prevent propagation of error messages by third party libraries, that is conceivable).

Is it going to be problematic for people trying to authenticate if they don't have enough information to solve the problem themselves, but have to ask someone with access to the log (how will they even know who to ask?).

That's eternal conflict between security and usability. The most secure server gives you no feedback except just 'failed' in a way that you have zero feedback if you're making any progress. For example, response time and error message for when the user does not exist should be the same as if the user exists bad the password doesn't match, so that it's not possible for the attacker for determine if the user account exists or not. But that's for a pure username:password use-case. For OAuth that's a bit extreme to not give any feedback about technical reasons for rejected token. I think the sanitised error messages still give all the relevant information that can be helpful - but without redundant server-side internals.

What if the people operating the cluster want a quiet life and would prefer to just pass this stuff back to the client? Maybe there's justification for not allowing this (e.g. security sensitive stuff), but you ought to make that justification here.

Actually if client isn't able to self-service now, they weren't able to self-service before either. But now it's much easier for the people operating the cluster because if the client used OAUTHBEARER they can now just ask for ErrId, and look it up in the log. Something they couldn't do before.

mstruk commented 3 years ago

I made another change to cleanup some left-over complexity from refactoring. I think this is is now ready to merge.

scholzj commented 3 years ago

@mstruk You have the approvals, so you can merge it your self when it is ready ;-)

mstruk commented 3 years ago

Alright! :)

strimzi / strimzi-kafka-oauth

Sanitise error messages and introduce ErrId markers #108