Open jdoss opened 11 months ago
Thanks a lot for the PR.
FYI nsjail is meant to be an EE feature, we do not disable it in the code but this is meant for our customers to be able to play around with it easily.
You seem to have the issue because you are leveraging a shared cache volume between your workers (which is the model that we put forward in the docker-compose). For production use-cases and on EE, there is a shared cache powered by S3 that is much faster and safer and that work using background sync between the worker cache and S3 + a fast initialization by pulling a tar.
I'm mentioning this because you seem to want to do the right thing and care about isolation (which is great) but are trying to make work an hybrid model which is never gonna scale to production use-cases. I think it would be great if we could spend efforts on improving the intended prod use-cases using global cache backed by S3 to work with SELinux well instead.
Thanks a lot for the PR.
No problem. Thanks for a wonderful open source project.
FYI nsjail is meant to be an EE feature, we do not disable it in the code but this is meant for our customers to be able to play around with it easily.
That's great that you don't disable the nsjail features in the CE of Windmill. I hope this feature stays this way.
You seem to have the issue because you are leveraging a shared cache volume between your workers (which is the model that we put forward in the docker-compose). For production use-cases and on EE, there is a shared cache powered by S3 that is much faster and safer and that work using background sync between the worker cache and S3 + a fast initialization by pulling a tar.
I am not sure on the logic in the code that does this but even if you stream a tarball of the Python modules to the worker filesystem you need to ensure that the files don't get copied from the container file system which has the container specific SELinux context into any shared volume mount that has the looser system_u:object_r:container_file_t:s0
context.
I'm mentioning this because you seem to want to do the right thing and care about isolation (which is great) but are trying to make work an hybrid model which is never gonna scale to production use-cases. I think it would be great if we could spend efforts on improving the intended prod use-cases using global cache backed by S3 to work with SELinux well instead.
It's scaling super well in my "production use-case" in my basement with SELinux disabled on the worker containers. 😉
The S3 cache feature seems to only be for EE so that doesn't really help CE users that want to cache dependencies across workers. I have two long running Nomad compute nodes in my basement homelab. These can run many Windmill worker containers and sharing a cache on the disk is fast and requires no S3 compatible service to stream tarballs from.
Does the S3 cache feature stream the tarball from S3 and expand it directly into /tmp/windmill/cache
or does it use /tmp/windmill/tmpcache
and it copies them from the container space into /tmp/windmill/cache
? If it is the latter the S3 cache feature will run into SELinux issues.
Leveraging a shared local cache on a volume mount (either in docker-compose, Nomad, or k8s) will increase performance and scalability.
That's great that you don't disable the nsjail features in the CE of Windmill. I hope this feature stays this way.
There is a warning currently that mention it's an enterprise feature and it is not intended to be used in the CE. I can see that you may be using Windmill enterprise features for personal purposes and the pricing is not adequate for that. When we remove nsjail from CE (which is likely to happen in the following weeks) we will very likely have an EE that is fairly priced for personal uses. We do need to have reasons for enterprises to pay for windmill and proper sandboxing is one of them. I apologize if we gave the wrong impression by currently having it there.
Does the S3 cache feature stream the tarball from S3 and expand it directly into /tmp/windmill/cache or does it use /tmp/windmill/tmpcache and it copies them from the container space into /tmp/windmill/cache? If it is the latter the S3 cache feature will run into SELinux issues.
It does use /tmp/windmill/tmpcache
indeed.
Leveraging a shared local cache on a volume mount (either in docker-compose, Nomad, or k8s) will increase performance and scalability.
In a k8s environment, this would require persistent volumes which is much more expensive than s3. It also has terrible cold start when scaling up on new nodes without a mounted volume. But anyway you're in a setting where s3 doesn't make sense.
Describe the bug
For a detailed write up of what is going on please see https://github.com/containers/podman/issues/20237
I have a handful of workers that share a container volume mount via Podman using
:z
on the mounts. This sets a SELinux context that allows any container to read files and directories in that mount. When one worker downloads the Python deps, due to how pip works, it uses/tmp
to unpack the modules and then it moves it to the target location. This cause the files to get the Container's SELinux and lets the one worker read all of the downloaded files but all the others workers can't read the cached Python modules and error out on script and flow runs.To reproduce
Looking into how Windmill handles pip downloads I found it's script which effectively runs the pip install command below. You can see it using
/tmp
and getting the container's SELinux contextsystem_u:object_r:container_file_t:s0:c4,c254
on the module files:Since pip by default uses
/tmp
as a workdir to unpack Python packages and then it moves those files into the target directory. This brings the container contextsystem_u:object_r:container_file_t:s0:c4,c254
along with those files and causes a bunch of SELinux drama.If you set a temp
TMPDIR
to a directory inside of the volume mount it gets the right contexts.I have a PR that fixes this pretty easily but I am not sure if that is inline with how the project want things done. Feel free to use it or modify it to fix this issue. Any other package manager that uses
/tmp
will or any other directory outside of the shared volume mount will run into this issue. A better solution might be to have people mount/tmp/windmill
as a shared volume for workers and set a global tmpdir for workers to use inside that volume mount to prevent SELinux issues.Expected behavior
Working Python scripts and flows that use Python Modules on systems that use Podman and SELinux.
Screenshots
No response
Browser information
No response
Application version
ghcr.io/windmill-labs/windmill:1.179.1
Additional Context
No response