rootless-containers / usernetes

Kubernetes without the root privileges
https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2033-kubelet-in-userns-aka-rootless
Apache License 2.0
853 stars 58 forks source link

usernetes with efa on AWS #322

Open vsoch opened 3 months ago

vsoch commented 3 months ago

hey @AkihiroSuda - happy weekend! I have a full setup working with flux and usernetes on AWS, and I added in the elastic fiber adapter (EFA) but it's absolutely not running. The link there has some background - it needs to leverage drivers on the host. Is there an extra bit of information / bind I need to add to the docker-compose setup to allow for that to happen? For context, the operator installs OK, and lammps even starts up OK, but (on a very small problem size that usually is done in 1-2 seconds on a bad machine) it's basically hanging:

$ kubectl  logs -f flux-sample-0-tjbkg -f
Defaulted container "flux-sample" out of: flux-sample, flux-view (init)
LAMMPS (7 Feb 2024 - Development - 8819275)

And the above hangs there. I'm thinking perhaps for usernetes I need to bind the driver location on the host to somewhere in usernetes? Or something else? Let me know if you have insights. As always, thank you for your help!

vsoch commented 3 months ago

oh wow it ran... but it was SO SLOW!

image

oh my! I do wonder if I'm missing a bind... actually I think it's there? EFA on the host I think is here:

$ ls /dev/infiniband/
uverbs0

and in the container I see it too:

# ls /dev/infiniband/uverbs0 
/dev/infiniband/uverbs0

But (even though efa is installed in the container) it could still be that it's not working... going to quickly run the tests.

vsoch commented 3 months ago

Ah interesting - so the bind is to the docker-compose base images, but the containers don't see efa! image

So there is some issue there. I wonder if installing the daemonset might work...

vsoch commented 3 months ago

Ah, got it working! I will post the full update tomorrow - basically I needed to hack the daemonset a bit, and then add the correct annotations for it to bind to the pod (of the job). Then I could run a sleep job, shell in, install fi_info for libfabric, and see efa and run the tests.

It's super late here but I'm going to be doing experiments soon and can post the full details of the setup.

vsoch commented 1 month ago

Update here - I think we've solved all the performance issues, but I need to do a few more scaled tests! @AkihiroSuda can I ask you a quick question? For your usernetes paper here:

https://arxiv.org/pdf/2402.00365v1

Does that component in Figure 1 (the "Intermediate NetNS" mentioned where the incoming / outgoing get routed through) get bypassed given using EFA? https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa.html

More specifically, in that diagram for EFA, would you say the usernetes "Intermediate NetNS" is part of the TCP/IP stack? Or the ENA device? https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa.html#efa-basics. I ask because I did some tweaks to our setup and was able to get the same performance on bare metal as in usernetes, and I think I can explain it based on those diagrams (if this is the case). Thanks for your help!

AkihiroSuda commented 1 month ago

The "intermediate netns" just refers to RootlessKit's namespace i.e., the namespace of dockerd

vsoch commented 1 month ago

So if we use the libfabric API that connects directly to the device (hardware) I'm guessing we are bypassing all of that.