Power management server

jaywonchung commented 8 months ago

NVML requires the Linux SYS_ADMIN capability for applications to set the GPU's power limit or frequency. In production environments, you can't just give your application containers SYS_ADMIN because it allows way too many things (man page). These excess permission can be exploited easily, for instance, if some open source dependency experiences a supply chain attack.

In order to reduce the attack surface, it should make sense to have a power management server per node, which exposes a handful of endpoints that allows applications (without SYS_ADMIN) to set the GPU's power limit or frequency. This is basically IPC and should have extremely very low latency. NVML function calls inside an application typically take 10-20 ms, and now the round trip including IPC time should not be that much higher.

Before doing this, we should first abstract away the GPU (#23) so that depending on whether the user's using the power management server or just directly setting power knobs, the GPU backend should be different.

saketjajoo commented 8 months ago

There are two potential approaches to address this issue, although additional options may also exist:

Making a change at the NVML library’s side to reduce the Linux privileges, or
Using something like a service account (that has the SYS_ADMIN privilege) that is attached to the process while it executes.

The rationale behind the pull request stems from the concern that the current excessive permissions could be easily exploited, particularly during a supply chain attack on an open-source dependency. If nvidia-ml-py (a PyPI library used to gather energy and power consumption in the Zeus project) is compromised via a supply chain attack, it could possibly lead to a security issue.

Currently, the documentation says that the nvmlDeviceSetPowerManagementLimit() API requires root/admin access. So, I don’t think much can be done for now about changing or using different permissions. However, if this changes in the future, it could solve the problem.

jaywonchung commented 8 months ago

There are two potential approaches to address this issue, although additional options may also exist:

Making a change at the NVML library’s side to reduce the Linux privileges, or

NVML is closed source, so this is not a viable option. Also, SYS_ADMIN is required because changing hardware management knobs is indeed what only user/process with system admin role should be allowed to do. So I won't expect NVML to lift this constraint any time soon.

Using something like a service account (that has the SYS_ADMIN privilege) that is attached to the process while it executes.

Could elaborate a bit more? According to my limited knowledge, service accounts are typically used in the context of cloud environments to allow access to certain privileged API endpoints. Is there service account implementations in general Linux kernels that grant Linux security capabilities to processes? We don't want to tie anything to cloud.

The rationale behind the pull request stems from the concern that the current excessive permissions could be easily exploited, particularly during a supply chain attack on an open-source dependency. If nvidia-ml-py (a PyPI library used to gather energy and power consumption in the Zeus project) is compromised via a supply chain attack, it could possibly lead to a security issue.

Currently, the documentation says that the nvmlDeviceSetPowerManagementLimit() API requires root/admin access. So, I don’t think much can be done for now about changing or using different permissions. However, if this changes in the future, it could solve the problem.

That's why we need a separate server process on the node that has SYS_ADMIN and exposes APIs like set_power_limit and set_frequency. Applications without SYS_ADMIN will request the power management server with IPC to set the GPU's power limit or SM frequency on behalf of them.

saketjajoo commented 7 months ago

That's why we need a separate server process on the node that has SYS_ADMIN and exposes APIs like set_power_limit and set_frequency.

However, I believe, theoretically, anyone could call this process to update the power limit and frequency.

Is there service account implementations in general Linux kernels that grant Linux security capabilities to processes? We don't want to tie anything to cloud.

Yes, there is the concept of system accounts in Linux as well. The command useradd --system ... will create a system account that can have custom privileges attached to it. This can be used to set the power limit and frequency. Perhaps, the new system account could be added to a newly created group which also has the ID of the user running the process. This way, any unauthorized user may not be able to use the system account to further call the APIs.

jaywonchung commented 1 month ago

Implemented in #81.

ml-energy / zeus

Power management server #29