windmill-labs / windmill

Open-source developer platform to turn scripts into workflows and UIs. Fastest workflow engine (5x vs Airflow). Open-source alternative to Airplane and Retool.
https://windmill.dev
Other
9.62k stars 440 forks source link

bug: Python packages get the wrong SELinux context when being downloaded by workers #2381

Open jdoss opened 11 months ago

jdoss commented 11 months ago

Describe the bug

For a detailed write up of what is going on please see https://github.com/containers/podman/issues/20237

I have a handful of workers that share a container volume mount via Podman using :z on the mounts. This sets a SELinux context that allows any container to read files and directories in that mount. When one worker downloads the Python deps, due to how pip works, it uses /tmp to unpack the modules and then it moves it to the target location. This cause the files to get the Container's SELinux and lets the one worker read all of the downloaded files but all the others workers can't read the cached Python modules and error out on script and flow runs.

To reproduce

Looking into how Windmill handles pip downloads I found it's script which effectively runs the pip install command below. You can see it using /tmp and getting the container's SELinux context system_u:object_r:container_file_t:s0:c4,c254 on the module files:

root@689f20478830:/tmp/windmill/wk-689f20478830-kyW2a# ls -dZ /tmp/windmill/wk-689f20478830-kyW2a
system_u:object_r:container_file_t:s0:c4,c254 /tmp/windmill/wk-689f20478830-kyW2a
root@689f20478830:/tmp/windmill/wk-689f20478830-kyW2a# /usr/local/bin/python3 -m pip install -v bupy -I -t ../cache --no-cache --no-color --no-deps --isolated --no-warn-conflicts --disable-pip-version-check
Using pip 23.1.2 from /usr/local/lib/python3.11/site-packages/pip (python 3.11)
Collecting bupy
  Downloading bupy-0.1.2-py3-none-any.whl (15 kB)
Installing collected packages: bupy
  Creating /tmp/pip-target-203yrywv/bin
  changing mode of /tmp/pip-target-203yrywv/bin/bupy to 755
Successfully installed bupy-0.1.2
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

root@689f20478830:/tmp/windmill/wk-689f20478830-kyW2a# ls -lahZ ../cache/bupy/
total 52K
drwxr-xr-x.  3 root root system_u:object_r:container_file_t:s0:c4,c254  158 Oct  4 06:16 .
drwxr-xr-x. 12 root root system_u:object_r:container_file_t:s0          148 Oct  4 06:16 ..
-rw-r--r--.  1 root root system_u:object_r:container_file_t:s0:c4,c254   44 Oct  4 06:16 __init__.py
-rw-r--r--.  1 root root system_u:object_r:container_file_t:s0:c4,c254  125 Oct  4 06:16 __main__.py
drwxr-xr-x.  2 root root system_u:object_r:container_file_t:s0:c4,c254 4.0K Oct  4 06:16 __pycache__
-rw-r--r--.  1 root root system_u:object_r:container_file_t:s0:c4,c254 1.2K Oct  4 06:16 butane.py
-rw-r--r--.  1 root root system_u:object_r:container_file_t:s0:c4,c254 8.1K Oct  4 06:16 cli.py
-rw-r--r--.  1 root root system_u:object_r:container_file_t:s0:c4,c254 5.0K Oct  4 06:16 fcos.py
-rw-r--r--.  1 root root system_u:object_r:container_file_t:s0:c4,c254 3.3K Oct  4 06:16 qemu.py
-rw-r--r--.  1 root root system_u:object_r:container_file_t:s0:c4,c254 5.3K Oct  4 06:16 template.py
-rw-r--r--.  1 root root system_u:object_r:container_file_t:s0:c4,c254 1.1K Oct  4 06:16 util.py

Since pip by default uses /tmp as a workdir to unpack Python packages and then it moves those files into the target directory. This brings the container context system_u:object_r:container_file_t:s0:c4,c254 along with those files and causes a bunch of SELinux drama.

If you set a temp TMPDIR to a directory inside of the volume mount it gets the right contexts.

root@689f20478830:/tmp/windmill/cache# TMPDIR=/tmp/windmill/cache/tmp /usr/local/bin/python3 -m pip install -v bupy -I -t ./pip --no-cache --no-color --no-deps --isolated --no-warn-conflicts --disable-pip-version-check
Using pip 23.1.2 from /usr/local/lib/python3.11/site-packages/pip (python 3.11)
Collecting bupy
  Downloading bupy-0.1.2-py3-none-any.whl (15 kB)
Installing collected packages: bupy
  Creating /tmp/windmill/cache/tmp/pip-target-p_2fyf4u/bin
  changing mode of /tmp/windmill/cache/tmp/pip-target-p_2fyf4u/bin/bupy to 755
Successfully installed bupy-0.1.2
WARNING: Target directory /tmp/windmill/cache/pip/bin already exists. Specify --upgrade to force replacement.
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
root@689f20478830:/tmp/windmill/cache# ls -lahZ pip/bupy
total 56K
drwxr-xr-x.  3 root root system_u:object_r:container_file_t:s0  158 Oct  4 06:26 .
drwxr-xr-x. 12 root root system_u:object_r:container_file_t:s0 4.0K Oct  4 06:26 ..
-rw-r--r--.  1 root root system_u:object_r:container_file_t:s0   44 Oct  4 06:26 __init__.py
-rw-r--r--.  1 root root system_u:object_r:container_file_t:s0  125 Oct  4 06:26 __main__.py
drwxr-xr-x.  2 root root system_u:object_r:container_file_t:s0 4.0K Oct  4 06:26 __pycache__
-rw-r--r--.  1 root root system_u:object_r:container_file_t:s0 1.2K Oct  4 06:26 butane.py
-rw-r--r--.  1 root root system_u:object_r:container_file_t:s0 8.1K Oct  4 06:26 cli.py
-rw-r--r--.  1 root root system_u:object_r:container_file_t:s0 5.0K Oct  4 06:26 fcos.py
-rw-r--r--.  1 root root system_u:object_r:container_file_t:s0 3.3K Oct  4 06:26 qemu.py
-rw-r--r--.  1 root root system_u:object_r:container_file_t:s0 5.3K Oct  4 06:26 template.py
-rw-r--r--.  1 root root system_u:object_r:container_file_t:s0 1.1K Oct  4 06:26 util.py

I have a PR that fixes this pretty easily but I am not sure if that is inline with how the project want things done. Feel free to use it or modify it to fix this issue. Any other package manager that uses /tmp will or any other directory outside of the shared volume mount will run into this issue. A better solution might be to have people mount /tmp/windmill as a shared volume for workers and set a global tmpdir for workers to use inside that volume mount to prevent SELinux issues.

Expected behavior

Working Python scripts and flows that use Python Modules on systems that use Podman and SELinux.

Screenshots

No response

Browser information

No response

Application version

ghcr.io/windmill-labs/windmill:1.179.1

Additional Context

No response

rubenfiszel commented 11 months ago

Thanks a lot for the PR.

FYI nsjail is meant to be an EE feature, we do not disable it in the code but this is meant for our customers to be able to play around with it easily.

You seem to have the issue because you are leveraging a shared cache volume between your workers (which is the model that we put forward in the docker-compose). For production use-cases and on EE, there is a shared cache powered by S3 that is much faster and safer and that work using background sync between the worker cache and S3 + a fast initialization by pulling a tar.

I'm mentioning this because you seem to want to do the right thing and care about isolation (which is great) but are trying to make work an hybrid model which is never gonna scale to production use-cases. I think it would be great if we could spend efforts on improving the intended prod use-cases using global cache backed by S3 to work with SELinux well instead.

jdoss commented 11 months ago

Thanks a lot for the PR.

No problem. Thanks for a wonderful open source project.

FYI nsjail is meant to be an EE feature, we do not disable it in the code but this is meant for our customers to be able to play around with it easily.

That's great that you don't disable the nsjail features in the CE of Windmill. I hope this feature stays this way.

You seem to have the issue because you are leveraging a shared cache volume between your workers (which is the model that we put forward in the docker-compose). For production use-cases and on EE, there is a shared cache powered by S3 that is much faster and safer and that work using background sync between the worker cache and S3 + a fast initialization by pulling a tar.

I am not sure on the logic in the code that does this but even if you stream a tarball of the Python modules to the worker filesystem you need to ensure that the files don't get copied from the container file system which has the container specific SELinux context into any shared volume mount that has the looser system_u:object_r:container_file_t:s0 context.

I'm mentioning this because you seem to want to do the right thing and care about isolation (which is great) but are trying to make work an hybrid model which is never gonna scale to production use-cases. I think it would be great if we could spend efforts on improving the intended prod use-cases using global cache backed by S3 to work with SELinux well instead.

It's scaling super well in my "production use-case" in my basement with SELinux disabled on the worker containers. 😉

The S3 cache feature seems to only be for EE so that doesn't really help CE users that want to cache dependencies across workers. I have two long running Nomad compute nodes in my basement homelab. These can run many Windmill worker containers and sharing a cache on the disk is fast and requires no S3 compatible service to stream tarballs from.

Does the S3 cache feature stream the tarball from S3 and expand it directly into /tmp/windmill/cache or does it use /tmp/windmill/tmpcache and it copies them from the container space into /tmp/windmill/cache? If it is the latter the S3 cache feature will run into SELinux issues.

Leveraging a shared local cache on a volume mount (either in docker-compose, Nomad, or k8s) will increase performance and scalability.

rubenfiszel commented 11 months ago

That's great that you don't disable the nsjail features in the CE of Windmill. I hope this feature stays this way.

There is a warning currently that mention it's an enterprise feature and it is not intended to be used in the CE. I can see that you may be using Windmill enterprise features for personal purposes and the pricing is not adequate for that. When we remove nsjail from CE (which is likely to happen in the following weeks) we will very likely have an EE that is fairly priced for personal uses. We do need to have reasons for enterprises to pay for windmill and proper sandboxing is one of them. I apologize if we gave the wrong impression by currently having it there.

Does the S3 cache feature stream the tarball from S3 and expand it directly into /tmp/windmill/cache or does it use /tmp/windmill/tmpcache and it copies them from the container space into /tmp/windmill/cache? If it is the latter the S3 cache feature will run into SELinux issues.

It does use /tmp/windmill/tmpcache indeed.

Leveraging a shared local cache on a volume mount (either in docker-compose, Nomad, or k8s) will increase performance and scalability.

In a k8s environment, this would require persistent volumes which is much more expensive than s3. It also has terrible cold start when scaling up on new nodes without a mounted volume. But anyway you're in a setting where s3 doesn't make sense.