Closed kevin-bates closed 6 years ago
The more I think about this the more I believe we should introduce some kind of option - leaning towards something like --impersonation_enabled
- here's why.
The enforcement of a particular KERNEL_USERNAME value by the EG server, doesn't really change anything unless the underlying kernelspecs are performing steps to impersonate the user identified by KERNEL_USERNAME
. That is, we have to assume that the kernelspec contents are secure in that not anyone can walk up to the files and make changes, such as adding sudo
for example. Therefore, I believe we can assume that command line options and kernelspecs contents are "on the same page" - so if a given option exists that indicates impersonation is being performed by the kernelspecs, then the kernelspecs perform that impersonation. Likewise, we must also assume that the case where the command line option does NOT exist, then the kernelspecs will NOT be performing impersonation[*]. I.e., the configuration of both the Enterprise Gateway configuation settings and kernelspecs contents can be trusted.
So, assuming the configuration and kernelspecs are trusted, then "enforcement" of KERNEL_USERNAME
would consist of two checks when impersonation is enabled:
KERNEL_USERNAME
must be provided. The server will not tolerate a missing KERNEL_USERNAME
, nor will it set its value to the current (service) user.KERNEL_USERNAME
cannot be the same as the current (service) user, nor can it be root
.We could also enforce that it map to an existing system user, although I'm not sure that's required for the YARN cluster mode
with its --proxy-user
value. And, since that might vary with other resource managers, its probably not advised to assume the impersonated user actually be a system user.
When --impersonation_enabled
is False, none of the above would occur. As mentioned previously, it would probably be a good idea to log a warning that kernels will run as the service or resource manager user accounts.
Another option is to unconditionally perform this enforcement. The disadvantage to that, however, is that it would mean that Enterprise Gateway could not be a direct replacement of Jupyter Kernel Gateway, and I believe we should preserve that as much as possible.
[*] Note: we may want to entertain the ability to move the --impersonation_enabled
setting into the ProcessProxy
config stanza (see Issue #161) where you could have one set of kernels performing impersonation and another set not. From a KERNEL_USERNAME
enforcement standpoint, this would just work since enforcement must occur at kernel startup anyway.
The more I think about this, the more I agree about the way you are thinking.
Another option is to unconditionally perform this enforcement. The disadvantage to that, however, is that it would mean that Enterprise Gateway could not be a direct replacement of Jupyter Kernel Gateway, and I believe we should preserve that as much as possible.
+1
[*] Note: we may want to entertain the ability to move the --impersonation_enabled setting into the ProcessProxy config stanza (see Issue #161) where you could have one set of kernels performing impersonation and another set not. From a KERNEL_USERNAME enforcement standpoint, this would just work since enforcement must occur at kernel startup anyway.
In which scenario do you think this would apply ? I see that the ability to turn security on/off is desired because your cluster might not have all the infra requirements (e.g. Kerberos) and thus you might want to disable it. But if you have the ability to impersonate users, I don't see why you would like to disable it, as I don't see this as a performance penalty as it's used during startup.
Yeah, I think it would be considered a security hole if the service user was enabled for impersonation, yet the kernel.json files did not perform the required impersonation. This would open the door for kernels running as high-privileged users. As a result, I should probably retract that aspect.
However, if its the case that no special privileges are required to perform impersonation in YARN cluster mode. Then you could have a case where regular remote kernels could run as the (non-privileged) service user, while remote-cluster kernels perform impersonation. This would then argue for the need to (optionally) specify impersonation at the kernelspec level. OTOH, if YARN cluster mode requires special privs for --proxy-user
then we could make enforcement of KERNEL_USERNAME
based on the presence of the (single) command-line option - rather than based on kernelspec configs.
For now, I'll treat enforcement based on the presence of the single command-line option and we can revisit when implementing Issue #161 - which would always override the command-line behaviors anyway.
In a side-bar discussion, the following was decided as to how to approach initial impersonation support.
KERNEL_USERNAME
.KERNEL_USERNAME
to the current service user. Of course, this will prevent kernel startup in cases where that user is in the blacklist, but the nice aspect is in cases like YARN cluster mode, the service user does not need to be privileged, so that defaulting is perfectly valid - especially when impersonation is only support in cluster-mode kernels, for example.Suggested option names: --EnterpriseGatewayApp.impersonation_enabled
(boolean, default=False), and --EnterpriseGatewayApp.impersonation_blacklist
(list of strings, default="").
Since this issue has evolved into essentially adding the initial support and documentation for impersonation, I'm changing the title accordingly. In addition, it should address Issue #50 upon its completion.
And the beat goes on... During testing it occurred to me that we should enforce the blacklist irrespective ofimpersonation_enabled = True
. In addition, for those customers wishing to constrain kernel usage to just a few users, having a whitelist would be beneficial. As a result, I plan on implementing the following...
--EnterpriseGatewayApp.impersonation_enabled
as previously described.--EnterpriseGatewayApp.authorized_users
(case-sensitive set of users authorized to launch the target kernel) and --EnterpriseGatewayApp.unauthorized_users
(case-sensitive set of users not authorized to launch the target kernel)The user authorization sets trigger the following behaviors:
impersonation_enabled
. unauthorized_users
always trumps the set of authorized_users
. authorized_users
set is empty, all users not in the unauthorized_users
set are authorized to launch kernels. authorized_users
set becomes non-empty, then it must be complete.Note that another advantage to having both sets of users is that when we implement Issue #161 we can associate these parameters to specific kernels and thereby achieve a means of allowing specific access to higher degrees of resources, etc. Any settings of authorized_users
at the kernel level would trump the command-line/config settings. (Thinking we'd take the union of kernel-level and command-line/config unauthorized_users
sets.)
So, based on this, I'm changing the title slightly.
For most settings, its okay to default a missing
KERNEL_USERNAME
value to the current user (which will typically be the service user running Enterprise Gateway). However, in some environments, the EG service user may need to run with higher level privileges in order tosu
to the kernel user - which, if defaulted to the service user, will enable kernels running as a user withsu
privileges, etc.One approach would be to require
KERNEL_USERNAME
be provided AND not be the same as the EG service user. This way, there is no defaulting necessary, although "standard" (non-elevated) configurations would see failures.Another may be to introduce an option indicating whether or not
KERNEL_USERNAME
should be defaulted (to the service user). This option could be something likesecure-mode
(default=True
). A value ofFalse
would also trigger an appropriate warning message at startup (for example).This issue is meant to track how Enterprise Gateway should go about enforcing or tolerating missing
KERNEL_USERNAME
values.