Open utam0k opened 3 years ago
flamegraph
Great. That's a nice example, and building the cargo binary is easy enough. I initially thought he cargo-subcommand
needs to be distributed independently since many examples shows the subcommand being under $HOME/.cargo
. For future reference, here is the flamegraph example: https://github.com/flamegraph-rs/flamegraph/blob/master/src/bin/cargo-flamegraph.rs
We can have:
cargo-devtool build
cargo-devtool integration_tests
and etc....
On a different note, I am curious if we should replace mio
with a simple pipe2
. We use mio to communicate between the main process and container process, mainly to sync and send pid back on the same host. After working with the code, I am under the impression that mio
seems to be an overkill for what we are trying to achieve. We already are sending customized byte messages, so mio doesn't add any better abstraction on top. I also don't think mio
can be any faster than pipe2
. Is there a historical reason mio
is picked?
Edit: Actually, we just have to get rid of event and epoll and use mio pipe. Is there any historical reason why mio even and epoll is used, instead just blocking and read the message (bytes) from the pipe?
Edit: Actually, we just have to get rid of event and epoll and use mio pipe. Is there any historical reason why mio even and epoll is used, instead just blocking and read the message (bytes) from the pipe?
The reason for this is very simple: it seems to be lightweight since it only does simple communication processing. I'm in favor of using mio as long as there is no negative impact on performance.
Can we bridge discord and telegram?
Can we bridge discord and telegram?
Can you elaborate on the reason/usecase? Can you also propose/research a solution on how would we do this? Without more context, we can't evaluate this question :)
Can you elaborate on the reason/usecase?
https://github.com/containers/youki/issues/444#issuecomment-959000243
Can you also propose/research a solution on how would we do this?
@unknowndevQwQ
Thanks for your interest. I'm sorry, but I don't plan to support TediCross
because few people need it. We aren't a company, so we believe that our operational resources should be minimized. I am sorry to hear that I could not meet your expectation :bow:
However, questions on issues are also welcome ;)
@unknowndevQwQ Thanks for your interest. I'm sorry, but I don't plan to support
TediCross
because few people need it. We aren't a company, so we believe that our operational resources should be minimized. I am sorry to hear that I could not meet your expectation bow However, questions on issues are also welcome ;)
I can provide resources to run tedicross
I'm some how curious about the performance comparing between crun
and youki
, here I tried with memory limitation:
coder@pearl:~/youki$ podman --runtime /usr/bin/crun run --rm --memory 1M fedora echo it works
it works
coder@pearl:~/youki$ sudo podman --runtime /home/coder/youki/youki run --rm --memory 1M fedora echo it works
will hangup.
More than 2M for youki
will be Ok.
Another issue I met is when we run with podman
and youki
, why we need sudo permission.
This command will failed with message:
coder@pearl:~/youki$ podman --runtime /home/coder/youki/youki run --rm --memory 4M fedora echo it works
Error: 5414a497512b4f25dc421921f4ab3a8dfe38afed7001aee1d306f8a82dacf48b does not exist.
ERRO[0000] Error removing container 5414a497512b4f25dc421921f4ab3a8dfe38afed7001aee1d306f8a82dacf48b from runtime after creation failed
Error: Permission denied (os error 13): OCI permission denied
Hi @chenyukang, I tried your two scenarios.
I'm some how curious about the performance comparing between
crun
andyouki
, here I tried with memory limitation:coder@pearl:~/youki$ podman --runtime /usr/bin/crun run --rm --memory 1M fedora echo it works it works
coder@pearl:~/youki$ sudo podman --runtime /home/coder/youki/youki run --rm --memory 1M fedora echo it works
will hangup.
More than 2M for
youki
will be Ok.
below are my results
sudo podman --runtime /home/tommady/youki run --rm --memory 1M fedora echo it works
[sudo] password for tommady:
Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
it works
podman --runtime /usr/bin/crun run --rm --memory 1M fedora echo it works
Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
it works
what I did is build youki current main branch with cargo build --release, hoping this can resolve your problem.
sudo podman --runtime /home/coder/youki/youki run --rm --memory 1M fedora echo it works
Yes, I also use current commit and with --release
.
hey @chenyukang, I traced both logs with --log-level debug
I found out that youki will hang a bit longer than crun at this point
INFO[0000] Running conmon under slice machine.slice and unitName libpod-conmon-b706596e493d71dfd5144529e0461ca752ee0155dac2d7dca0eaed7f41e1d0e3.scope
but the output of logs seems they went through exactly the same processes with exactly the same args. after all, on my local computer, it is just a hang longer than crun, not hangup entirely.
I am very interested in this case, could you please open an issue then assign it to me? thank you 🙇🏻
On Sat, Nov 06, 2021 at 02:02:48AM -0700, tommady wrote:
could you please open an issue
and use this issue for ideas ...
Hi folks, great work so far! Have you considered zbus as an alternative to dbus bindings? I am not sure how feature complete it is, but I've found it recently whe browsing random stuff.
Keep up the amazing work, be safe and take care!
Recently they have released zbus 2.0, tbf I'm not sure if adding this would pay off at the end.
@darleybarreto Thanks for telling me! I'll check it later ;)
need rust shim for memory friendly!
Hi, is it possible to run the youki by calling gRPC directly?
I am not sure is it suppose to be provided by youki or containers.
As far I know, containers provide OCI command to run the container, and youki is a container runtime implemented OCI.
However I notice that youki also provide crate to run the container, too, which is in here.
I want to have the most native way to run the container directly instead of calling docker run
, as fast as possible.
Rust docker client libray like shiplift and bollard are good, but I am looking for more advanced one.
Could you guys give me some advices?
Hi, is it possible to run the youki by calling gRPC directly?
Can you elaborate on what do you mean? Are you looking for something similar to the docker backend? There is a possibility that you can use CRI interface with cri-o
or containerd
. The CRI apis are meant for kubelet consumption, but it is a well defined gRPC interface.
If you are looking to just launch containers using the OCI interface, you can build something on top of youki. Youki supports the OCI spec and doesn't provide grpc out of box. It is a low level components compared to kubelet and docker.
I want to have the most native way to run the container directly instead of calling docker run, as fast as possible.
Again, can you elaborate here? What do you mean by more native and fast here? Can you explain your use cases?
I am looking for more advanced one.
Again, OCI compared to docker is not necessarily more advanced. It is a lower level of abstraction.
@yihuaf We have yet to do a full cri-o or containerd evaluation yet. I'm supposed to be doing the containerd evaluation, but a lot of things have been changing to support CI testing for different high level runtime. Last I heard was CRI-O is mostly working.
thank you guys. I did not expect to receive these feedback so quickly. Obviously, I am consufed between the meaning of OCI and CRI here.
Let me rephrase my question. I want a Rust library to run a container launched by OCI compatible runtime instead of calling docker or nerdctl command directly.
My use case is I want to rewrite the backend of Rust Playground by myself and I want to minimize the burden of running container who executing the potentially evail code. The container runtime including youki (fast start-up)and gvisor(satefy sandbox) are in my candidate list.
Any comments are helpful
The first question would be to understand what isolation level is good enough for the "evil" code. Do we need VM-level isolation or hardened container-level isolation is good enough? If you want VM-level isolation, I would suggest you take a look at firecracker. You would need to decide on the trade-off between startup time vs. security.
Assuming we want to use containers. The first question is if we want to use OCI, such as runc
or youki
. OCI does not take care of downloading images, unpacking images into OCI runtime bundle, life cycle, and a few other things. There is currently a gap between what Youki offers (OCI) and what docker or CRI offers, and I believe there is currently no rust solution in this space. Currently, I have a closed-sourced project using skopeo
+ umoci
+ runc
(Youki) and some rust code glue to close the gap, and maybe that is good enough for your use case.
Now, if you decided that it is the right tool to move to OCI level, then using Youki as a library or as a CLI is good. Note, you will have to take care of creating the OCI bundle in your own code. You can have a rust container ready on a host, have it as a base, send the rust code to the host, union fs to create a new rootfs with the code, create a OCI bundle config, and call Youki with the config.
With that being said, since you mentioned gRPC, I suspect your usecase is at a higher level of abstraction compared to OCI. Without knowing more of your requirement, I would start with docker or CRI (containerd, cri-o).
Side note: There is a lack of rust alternatives in the higher level of the container ecosystem. A lot of people working on this project would love to contribute. For example, there is currently no good library to manipulate container images like the skopeo
and umoci
. Personally, I wish I can have more time to work on some of these ideas.
Again, thanks for thorough explanation and sharing ideas.
I would like to give firecracker a try (VM for safety, fast start-up and REST API) and have reference to great post by jvns.
Still new in Rust, container, and VM, but I am willing to take part into the Rust community and container ecosystem. Looking forward to some possibility if I am ready .
Thank you all, really
@timchenxiaoyu @yihuaf @tsturzl Thanks for the great conversation! I've read.
Here I am posting an idea for optimizing Youki with Profile-Guided Optimization (PGO). Recently I started evaluating PGO across multiple software domains - all my current results are available here: https://github.com/zamazan4ik/awesome-pgo . For Youki I did some quick benchmarks on my local Linux machine and want to share the actual performance numbers.
rustc 1.72.0 (5680fa18f 2023-08-23)
646c1034f78454904cc3e1ccec2cd8dc270ab3fd
commit) in the main
branchAs a benchmark, I use the suggested in the README file workload with sudo ./youki create -b tutorial a && sudo ./youki start a && sudo ./youki delete -f a
youki_release
is built with just youki-release
. PGO optimized build is done with cargo-pgo (cargo pgo build
+ run the benchmark with the Instrumented Youki + cargo pgo optimize build
). As a training workload, I use the benchmark itself.
The results are presented in hyperfine
format. All benchmarks are done multiple times, in different order, etc - the results are reproducible.
sudo hyperfine --prepare 'sudo sync; echo 3 | sudo tee /proc/sys/vm/drop_caches' --warmup 100 --min-runs 500 'sudo ./youki_release create -b tutorial a && sudo ./youki_release start a && sudo ./youki_release delete -f a' 'sudo ./youki_optimized create -b tutorial a && sudo ./youki_optimized start a && sudo ./youki_optimized delete -f a'
Benchmark 1: sudo ./youki_release create -b tutorial a && sudo ./youki_release start a && sudo ./youki_release delete -f a
Time (mean ± σ): 78.6 ms ± 3.7 ms [User: 11.2 ms, System: 43.9 ms]
Range (min … max): 70.9 ms … 97.8 ms 500 runs
Benchmark 2: sudo ./youki_optimized create -b tutorial a && sudo ./youki_optimized start a && sudo ./youki_optimized delete -f a
Time (mean ± σ): 77.4 ms ± 3.6 ms [User: 10.9 ms, System: 44.1 ms]
Range (min … max): 70.6 ms … 90.0 ms 500 runs
Summary
sudo ./youki_optimized create -b tutorial a && sudo ./youki_optimized start a && sudo ./youki_optimized delete -f a ran
1.02 ± 0.07 times faster than sudo ./youki_release create -b tutorial a && sudo ./youki_release start a && sudo ./youki_release delete -f a
Just for reference, I also share the results for Instrumentation mode:
LLVM_PROFILE_FILE=/home/zamazan4ik/open_source/youki/target/pgo-profiles/youki_%m_%p.profraw sudo hyperfine --prepare 'sudo sync; echo 3 | sudo tee /proc/sys/vm/drop_caches' --warmup 10 --min-runs 100 'sudo ./youki_instrumented create -b tutorial a && sudo ./youki_instrumented start a && sudo ./youki_instrumented delete -f a'
Benchmark 1: sudo ./youki_instrumented create -b tutorial a && sudo ./youki_instrumented start a && sudo ./youki_instrumented delete -f a
Time (mean ± σ): 161.1 ms ± 3.3 ms [User: 20.3 ms, System: 116.8 ms]
Range (min … max): 154.8 ms … 170.7 ms 100 runs
According to the tests, PGO helps with achieving quite better performance (1-2%). Not a great win but it's not bad "just" for a compiler option. On a scale, even 1% is a good thing to achieve.
If you think that it's worth it, I think we can perform more robust PGO benchmarks for Youki. And then document the results of the project. So other people will be able to optimize Youki for their own workloads.
@zamazan4ik I am interested in PGO. First of all, may I ask you to create about using PGO?
If you think that it's worth it, I think we can perform more robust PGO benchmarks for Youki. And then document the results of the project. So other people will be able to optimize Youki for their own workloads.
It sounds great to me 💯 Personality, I want to learn PGO. Let's give it a try!
Sure! Here it is: https://github.com/containers/youki/issues/2386
More of a question, does running youki with terminal: false
+ detach
have the same properties as runc's detached passthrough mode in that it passes file descriptors 0-2 directly to the child and there remains no shim process between the parent and the containerized child?
More of a question, does running youki with
terminal: false
+detach
have the same properties as runc's detached passthrough mode in that it passes file descriptors 0-2 directly to the child and there remains no shim process between the parent and the containerized child?
Thanks for your question. I couldn't understand what you pointed at shim process
. Does it mean double-fork or containerd-shim?
If process A uses youki to spawn containerized process B then anything sitting between A and B in the process tree would be a shim process. Be it conmon
, containerd-shim
or anything else. Reparenting to a daemon would also be undesirable.
@the8472 As far as I know, youki doesn't have this option. May I ask you to create an issue and implement it?
Not much an idea but I wanted to run my gpu workloads using youki. It's so practical and it seems that it just missing a way to share or access GPUs. Do you know if anyone managed to do it ? I can help coding if there are some directions !
Not much an idea but I wanted to run my gpu workloads using youki. It's so practical and it seems that it just missing a way to share or access GPUs. Do you know if anyone managed to do it ? I can help coding if there are some directions !
Hey, while I investigate more on this, can I ask you to check something : If my understanding is correct, the nvidia gpu support (specifically nvidia) is done by container pre-start and such hooks, and does not require any special functionality from runtime at all. Can you check if you can run a gpu workload (simply listing gpu/getting gpu stats would suffice) on your machine using docker+runc/crun and then try the same with youki? If there is no special runtime facility required then both should work similarly.
Also, am I understanding your question wrong? Do you mean running gpu workload directly with youki without having something like docker?
I mean running the GPU workload and ditch docker for good, for instance running llama.cpp in a simple container. I will try to run the tests you proposed. I see that nvidia has a framework and that https://github.com/Arc-Compute/LibVF.IO/ abstract a good part of that for other GPUs. I think you are right, you attribute the capability to the namespace/container at the setup time then through MMIO or any other magic channel libraries can "see" the GPU. Does that makes sense ?
you attribute the capability to the namespace/container at the setup time then through MMIO or any other magic channel libraries can "see" the GPU. Does that makes sense
Basically yes ; IIUC, the main issue with using nvidia gpu like any other device is that because nvidia drivers are non-gpl/proprietary , they are not registered like other devices in the kernel. The driver then does some "stuff" to make the gpu appear as a device. However when it comes to runtime, the 'not registering properly' causes issues in mounting that device into the container. I saw a couple of implementation problem for directly mounting gpu /dev/...
files into container in runc's issues and PRs. All in all, it is seems complected.
Unless there is a strong request for youki-native support of gpu, I don't think we will be doing anything soon. Another major hurdle here is that there are no good/supported emulators for validation of our code, which means the developer and the reviewer both must have gpus to develop code and test it. It also is not testable in CI. Unless there is also a certain use for having support of such feature ; similar to how we currently support wasm : runwasi is using some of youki's libraries for their purposes, so our wasm support gets used by them. Unless someone is wanting such native gpu support, this feature has a risk of stagnation and unknown breakages.
On a more personal note, I do think the suggestion is quite attractive. Having such a support directly at runtime level can solve some issues I can think of with container/gpu interaction, and also would create a much seamless experience. That said the concerns I said before still stands, and it certainly does not seem a simple issue to tackle.
Understood. I am still figuring out how to execute the tests you proposed. I'm thinking that with local models as whisper/llama.cpp and many others having a way to package/coordinate and share resources. I did some of that with qemu but as you said, depending on the GPU that's not feasible. I'm curious to see how https://modal.com/ and other providers are using Rust to do GPU containers. That could be huge.
@gleicon Probably, nvidia-container-tookit
provided by Nvidia is what you want.
https://github.com/NVIDIA/nvidia-container-toolkit
@gleicon
Hey, so looking more into this, I think youki (and other runtimes) are already having "full" support for gpu. Even right now, you can manually add the gpu devices in /dev/nvidia*
via either docker or the config.json
file, and they will get mounted in the container, where they can be accessed as gpu devices. Looking on some issues on runc https://github.com/opencontainers/runc/issues/3671 and specially https://github.com/opencontainers/runc/issues/3708 , the core problem here is that as the nvidia drivers are proprietary, they do not register with kernel like "normal" drivers. The files created in /dev
are by the driver and not by the kernel. So some events can cause changes to them which can break the mounting of the gpu devices in container. the nvidia toolkit fixes this by creating and monitoring symlinks to the actual drivers, and auto-updating the symlinks if anything changes. That way we can mount those symlinks as devices in container and stuff does not break.
Also, a quick validation that gpu are correctly accessible in container is using nvidia-smi
program, which lists the gpu on system. On a side note, the container itself will need to have nvidia gpu drivers in the image apart from mounting the devices.
There is a debate o gpu passthroug using virtio and mmu that I'm not expert but testing (mainly docker and firecracker) has a series of limitation. Firecracker is an outlier as they state that theyr io-mmu approach was built from ground up with goals that would conflict with enabling gpu (as per their issues and discussion). I dont have an nbidia at hand - my main mmu is apple silicon and ati. Nvidia drivers and frameworks works as libvf.io IIUC. I am trying to do a clean setup again and compare a container running whisper cpp across them to compare. Just seeing /dev/gpu doesnt seems to do the magic in my current setup
@gleicon I think what you mentioned should be responsible for high-level runtime, not OCI runtime. What do you expect for youki?
I expect something like this on lxc (which is where I test the basics of namespace for better understanding) for all GPUs: https://ubuntu.com/tutorials/gpu-data-processing-inside-lxd#6-add-your-gpu-to-the-container - at some point you have to attach or present the gpu to the container. What I aim to by using youki or a simpler/leaner runtime besides using rust is to run local llms sharing a GPU in a simpler way. It may be due that I don't have a NVIDIA GPU but support is not uniform. But thanks, I have to do more of my homework and find a NVIDIA setup !
@gleicon I always welcome learning. Come back here when you find good ideas 😍
Would it be possible to add checksums in the github release assets for each file ? This helps with the supply chain security of downstream integration (in my case, Kubespray). (We record the checksums in our source, and while we can, and do when there is no alternative, download the assets and compute the checksum ourselves, it's much more efficient in terms of traffic to GET only a checksum :smile: )
Thank you :+1:
@VannTen It sounds good to me. Do you have the intention to contribute?
Why not, I'll put that on my TODO list. I assume https://github.com/containers/youki/blob/main/scripts/release_tag.sh would be the relevant automation ?
It would be nice to add support for running PuzzleFS, a next-generation container filesystem that's also written in Rust. I've recently added suport for using PuzzleFS images with the LXC container runtime. This uses LXC's OCI template and the pre-mount hook for mounting the rootfs using puzzlefs mount ...
.
I didn't investigate how this would be possible in youki, I thougt I'll open the discussion first and see if people are pro or against this idea.
@ariel-miculas Thanks for your comment. That is interesting but it would be relevant with high-level container runtime like contained.
Feel free to post ideas.