rancher-sandbox / rancher-desktop

Container Management and Kubernetes on the Desktop
https://rancherdesktop.io
Apache License 2.0
5.91k stars 279 forks source link

Kubernetes error with rancher while CyberArk EPM is active #6578

Open ricbar3 opened 7 months ago

ricbar3 commented 7 months ago

I am running into an issue with a Mac M1 w/ 14.3 that has the newest EPM agent installed on it. I have also tried past versions of the EPM agent. When tarting rancher no matter if admin I get the following error:

'0' ], stdout: '', stderr: 'time="2024-02-11T19:29:31Z" level=info msg="Using the existing instance \"0\""\n' + 'time="2024-02-11T19:29:31Z" level=info msg="Starting vde_switch daemon for \"rancher-desktop-shared\" network"\n' + 'time="2024-02-11T19:29:31Z" level=fatal msg="failed to run [sudo --user root --group wheel --non-interactive /bin/mkdir -m 775 -p /private/var/run]: stdout=\"\", stderr=\"sudo: a password is required\\n\": exit status 1"\n', code: 1, [Symbol(child-process.command)]: '/Applications/Rancher Desktop.app/Contents/Resources/resources/darwin/lima/bin/limactl.ventura start --tty=false 0' }

EPM seems to have an issue with the command “—user root –group …”

I have an active case with cyberark and they suggested the following:

“sudo –user root –group wheel mkdir..” To “sudo mkdir …. So no arguments will be given to sudo.

Was hoping to get some help on how the issue can be solved so EPM is running and Rancher to work correctly

mook-as commented 6 months ago

As a partial workaround, consider disabling administrative access to prevent the need to run sudo.

Otherwise we'll probably need more detailed information on what exactly CyberArk expects in this case (since being non-interactive is important, as we do not have a controlling terminal).

ricbar3 commented 6 months ago

As a partial workaround, consider disabling administrative access to prevent the need to run sudo.

Otherwise we'll probably need more detailed information on what exactly CyberArk expects in this case (since being non-interactive is important, as we do not have a controlling terminal).

The reason we do need admin is because of bridged networking so that workaround is not viable. I believe it has to do with the command “sudo –user root –group wheel mkdir..”

EPM can elevate if needed and I have tried adding rancher to an elevation policy but since the command tries to use root I think that is where the issue comes in. If instead it just had “sudo mkdir" I think it would work. Trying to use root seems to be the issue

jandubois commented 6 months ago

I believe it has to do with the command “sudo –user root –group wheel mkdir..”

There is no short-term way to get rid of --user root --group wheel; it will need to be changed upstream in the Lima project; the code is not part of Rancher Desktop itself. The user/group are specified because some daemon processes run with --user daemon instead of --user root.

This is really a bug in EPM though, so the argument to request a change in another project to work around a bug in a completely unrelated product is a bit hard to make; why can't EPM be fixed instead?

ricbar3 commented 6 months ago

I believe it has to do with the command “sudo –user root –group wheel mkdir..”

There is no short-term way to get rid of --user root --group wheel; it will need to be changed upstream in the Lima project; the code is not part of Rancher Desktop itself. The user/group are specified because some daemon processes run with --user daemon instead of --user root.

This is really a bug in EPM though, so the argument to request a change in another project to work around a bug in a completely unrelated product is a bit hard to make; why can't EPM be fixed instead?

I am going back and forth right now trying to figure that out as well. Basically, hoping one vendor can figure out how to get this fixed

jandubois commented 6 months ago

I've been looking into this some more. The Supported sudo arguments on macOS docs for EPM state that only -E, -u, and -- are supported sudo options.

So getting rid of --user and --group in Lima would be possible; it is only configured because vde_vmnet uses the daemon user and not root, but the argument could be removed for socket_vmnet.

However, I think this is a red herring, and the arguments are supported, including the --non-interactive option that follows later. It requests that the command be executed without prompting for a password, which has been denied:

sudo: a password is required

Lima normally writes a file /etc/sudoers.d/zzzzz-rancher-desktop-lima with all commands that it expects to be able to execute without a password prompt:

$ sudo cat /etc/sudoers.d/zzzzz-rancher-desktop-lima
%everyone ALL=(root:wheel) NOPASSWD:NOSETENV: /bin/mkdir -m 775 -p /private/var/run
…

I don't know if that file even gets written when CyberARK is installed, but if it is, then it is ignored or superseded with different instructions later.

A similar issue has been filed against colima, which also runs Lima under the hood: https://github.com/abiosoft/colima/issues/854. The user states:

At the corporate environment there is CyberARK installed that prevents using sudo without password.

I found a further hint in https://github.com/rancher-sandbox/rancher-desktop/issues/1224#issuecomment-1054885300:

I'm facing the same issue. I have sudo previleges but my IT dept adds this to the bottom of the sudoers file

Defaults timestamp_timeout=0 %_cyberarkepm_sudoers ALL = (ALL) PASSWD: ALL ##

so RD fails to handle the sudo prompt.

Assuming you have a similar line in your /etc/sudoers file, and it comes after the #includedir /private/etc/sudoers.d line, then it will overwrite the Lima configuration and no sudo command can be executed without an explicit password prompt in the terminal (which does not exist for a GUI application like Rancher Desktop).

Assuming the CyberARK configuration is non-negotiable with your IT department, the only workaround to still get a bridged interface with Rancher Desktop is to manage it yourself.

Note that this is not a supported setup, and it may break other configuration options, but I have verified that the basic idea works:

First you need to disable "Administrative Access" in Rancher Desktop and then stop the application. Do not enable it again in the future while using this workaround.

Next you have to manually run a socket_vmnet daemon for the bridged interface manually:

$ sudo /opt/rancher-desktop/bin/socket_vmnet --pidfile=/private/var/run/rancher-desktop-bridged_en0_socket_vmnet.pid --socket-group=everyone --vmnet-mode=bridged --vmnet-interface=en0 /private/var/run/socket_vmnet.rancher-desktop-bridged_en0
Initializing vmnet.framework (mode 1002)
Using network interface "en0"
* vmnet_mtu: 1500
* vmnet_interface_id: D0A598BC-B582-4A8D-A923-ACFB315BE6DC
* vmnet_max_packet_size: 1514
* vmnet_mac_address: 72:cd:ac:96:0a:7b
Accepted a connection (fd 6)
…

This assumes that you are bridging through the en0 interface on the host. You may need to change it to en1 or whatever is the active interface for your local network.

This command will need to continue to run, so you will not get back to a shell prompt. Ideally you would wrap this as a Launch Agent, but I suspect your environment will not allow you to do that either.

Afterwards you must create an override.yaml file like this:

$ cat "$HOME/Library/Application Support/rancher-desktop/lima/_config/override.yaml"
networks:
- socket: /private/var/run/socket_vmnet.rancher-desktop-bridged_en0
  interface: rd0

Make sure the socket path matches the one you specified above (change en0 to whatever interface you are using). Do not change the rd0 interface name.

Now start Rancher Desktop. You should have a bridged network adapter through this manually managed socket_vmnet daemon.

If you are using moby, you will not have the Docker socket in the default location, but there should be a rancher-desktop context defined that most apps should use automatically to find the socket location.

It should be automatically selected, but you can make sure with

$ docker context use rancher-desktop
rancher-desktop
Current context is now "rancher-desktop"

After you quit Rancher Desktop you can terminate the daemon with ⌃C in the terminal window. You can leave it running if you expect to start Rancher Desktop again.

If you ever want to stop manually managing the network daemon, then you must delete the override.yaml file, as it might interfere with normal operation if the socket doesn't exist/isn't open.

I understand that this is not an ideal user experience, but it is the best I can come up with right now.

Ideally the whole sudoers part should be replaced with a privileged helper process, but that will be a lot of effort and is not on the short or medium term roadmap.

jandubois commented 6 months ago

I did some more experiments, and while the workaround above gives you the bridged interface inside the VM, it won't configure the Kubernetes cluster to use it.

For that to work you must run Rancher Desktop with Administrative Access enabled. To avoid Lima throwing an error we have to also define rd1 as an unmanaged interface. Pointing it at the same socket_vmnet daemon worked for me; and I got a bridged IP address on both rd0 and rd1 with

$ cat "$HOME/Library/Application Support/rancher-desktop/lima/_config/override.yaml"
networks:
- socket: /private/var/run/socket_vmnet.rancher-desktop-bridged_en0
  interface: rd0
- socket: /private/var/run/socket_vmnet.rancher-desktop-bridged_en0
  interface: rd1

(and starting Rancher Desktop with Administrative Access enabled).

I then created a LaunchDaemon definition for the socket_vmnet daemon, so I don't have to manually start it:

$ cat /Library/LaunchDaemons/io.rancherdesktop.socket_vmnet.bridged.en0.plist
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
    <dict>
        <key>Label</key>
        <string>io.rancherdesktop.socket_vmnet.bridged.en0</string>
        <key>Program</key>
        <string>/opt/rancher-desktop/bin/socket_vmnet</string>
        <key>ProgramArguments</key>
        <array>
            <string>/opt/rancher-desktop/bin/socket_vmnet</string>
            <string>--pidfile=/private/var/run/rancher-desktop-bridged_en0_socket_vmnet.pid</string>
            <string>--socket-group=everyone</string>
            <string>--vmnet-mode=bridged</string>
            <string>--vmnet-interface=en0</string>
            <string>/private/var/run/socket_vmnet.rancher-desktop-bridged_en0</string>
        </array>
        <key>StandardErrorPath</key>
        <string>/var/log/socket_vmnet.rancher-desktop-bridged_en0.stderr</string>
        <key>StandardOutPath</key>
        <string>/var/log/socket_vmnet.rancher-desktop-bridged_en0.stdout</string>
        <key>RunAtLoad</key>
        <true />
        <key>UserName</key>
        <string>root</string>
    </dict>
</plist>

This file must be owned by root:wheel, otherwise it won't work.

I don't know if CyberARK will allow you to create and configure this file; these are the commands I used:

sudo chown root:wheel /Library/LaunchDaemons/io.rancherdesktop.socket_vmnet.bridged.en0.plist
sudo launchctl bootstrap system /Library/LaunchDaemons/io.rancherdesktop.socket_vmnet.bridged.en0.plist
sudo launchctl enable system/io.rancherdesktop.socket_vmnet.bridged.en0
sudo launchctl kickstart -kp system/io.rancherdesktop.socket_vmnet.bridged.en0

I then launched Rancher Desktop and got an external IP address for the traefik ingress controller.

Afterwards I rebooted my machine and logged back in. I checked that the daemon was being launched automatically:

$ ps -ef | grep socket_vmnet
    0   113     1   0 10:27pm ??         0:00.42 /opt/rancher-desktop/bin/socket_vmnet --pidfile=/private/var/run/rancher-desktop-bridged_en0_socket_vmnet.pid --socket-group=everyone --vmnet-mode=bridged --vmnet-interface=en0 /private/var/run/socket_vmnet.rancher-desktop-bridged_en0
  501  8914  1665   0 11:00pm ttys000    0:00.00 grep socket_vmnet

Just starting Rancher Desktop again now used this daemon and configured a bridged IP address for the cluster ingress.

I can't be sure if this will work for you, as I don't have access to a system with CyberARK, but should be worth a try.

If you want to disable the daemon, you can run this command:

sudo launchctl bootout system /Library/LaunchDaemons/io.rancherdesktop.socket_vmnet.bridged.en0.plist

before you delete the plist file and the override.yaml file.

Also, if you ever perform a Rancher Desktop factory reset, then you will have to recreate the override.yaml file because the while "$HOME/Library/Application Support/rancher-desktop/lima directory and everything below will be deleted.

Please report back here if you try this approach and let us know if it works for you!

fethiarras commented 2 months ago

Hi,

I made an Enhancement Request to CyberARK for supporting --non-interactive on EPM. Since agent 24.7, you need to open a case to CyberARK, if you want activate the support of --non-interactive and it's work :)