tkestack / vcuda-controller

Other
488 stars 156 forks source link

Usage is not clear #6

Closed raz-bn closed 4 years ago

raz-bn commented 4 years ago

Hi! I want to try and use your project; however, it is not clear to me how to use the vcuda-controller. The image I get from the script ./build-img.sh should be used as a base image for my GPU application? Should it be deployed on my k8s cluster?

I tried to use the vcuda-controller as a base image for a simple GPU CUDA stress test using this Docker file:

FROM nvidia/cuda:8.0-devel as build

RUN apt-get update && apt-get install -y --no-install-recommends \
        wget && \
    rm -rf /var/lib/apt/lists/*

WORKDIR /root
RUN wget http://wili.cc/blog/entries/gpu-burn/gpu_burn-0.7.tar.gz && tar xzf gpu_burn-0.7.tar.gz && make

FROM tkestack.io/gaia/vcuda:latest

COPY --from=build /root/gpu_burn /root/gpu_burn
ENTRYPOINT [ "/root/gpu_burn" ]
CMD [ "10" ]   # burn for 10 secs

and I get this error:

/root/gpu_burn: error while loading shared libraries: libcublas.so.8.0: cannot open shared object file: No such file or directory

also when trying to run the vcuda-controller image as is on my k8s cluster (GPU-manager and GPU-admission are also present) using the example YAML from the GPU-manager repo:


apiVersion: v1
kind: Pod
metadata:
  name: test
  labels:
    app: test
spec:
  containers:
    - name: test
      image: razbne/vcuda
      command: ['/usr/local/nvidia/bin/nvidia-smi']
      resources:
        requests:
          tencent.com/vcuda-core: 10
          tencent.com/vcuda-memory: 10
        limits:
          tencent.com/vcuda-core: 10
          tencent.com/vcuda-memory: 10

verfing GPU is attached:

[root@test/]# lspci
0000:00:00.0 Host bridge: Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX Host bridge (AGP disabled) (rev 03)
0000:00:07.0 ISA bridge: Intel Corporation 82371AB/EB/MB PIIX4 ISA (rev 01)
0000:00:07.1 IDE interface: Intel Corporation 82371AB/EB/MB PIIX4 IDE (rev 01)
0000:00:07.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 02)
0000:00:08.0 VGA compatible controller: Microsoft Corporation Hyper-V virtual VGA
0001:00:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)

I can not manage to make Nvidia-smi work, and it just hangs without any output I will be happy if you can share more information about how to use the vcuda-controller

mYmNeo commented 4 years ago

If you want to use this feature, please take a look at https://github.com/tkestack/gpu-manager

raz-bn commented 4 years ago

If you want to use this feature, please take a look at https://github.com/tkestack/gpu-manager

@mYmNeo, what feature are you talking about? I already have the gpu Manger and gpu admission running in my cluster. But I don't know how to set up the vcuda controller, the gpu manager repo doesn't provide this info.

raz-bn commented 4 years ago

I manage to solve the nvidia-smi issue according to this issue in the gpu-manager repo by adding this:

      securityContext:
        privileged: true

to my YAML file, still looking for a better way to grant non-root user the permission of /etc/vcuda.

The problem with the GPU stress test is still present, hope you can help me with that @mYmNeo

joyme123 commented 4 years ago

@raz-bn I don't think you need to build your docker image with vcuda-control. Just use gpu-manager and gpu-admission. It will work fine

raz-bn commented 4 years ago

@raz-bn I don't think you need to build your docker image with vcuda-control. Just use gpu-manager and gpu-admission. It will work fine

@joyme123 Now I am a bit confused, so what do I do with the vcuda-controller?

joyme123 commented 4 years ago

@raz-bn if you want to build gpu-manager image with vcuda support by yourself. vcuda-controller is useful.otherwise,you don't need to do anything.

joyme123 commented 4 years ago

@raz-bn vcuda-controller is for this feature

GPU manager also supports the payload with fraction resource of GPU device such as 0.1 card or 100MiB gpu device memory. If you want this kind feature, please refer to vcuda-controller project.

the default gpu-manager image is built with vcuda-controller, so you don't need to do anything if you use the default gpu-manager image.

raz-bn commented 4 years ago

@raz-bn vcuda-controller is for this feature

GPU manager also supports the payload with fraction resource of GPU device such as 0.1 card or 100MiB gpu device memory. If you want this kind feature, please refer to vcuda-controller project.

the default gpu-manager image is built with vcuda-controller, so you don't need to do anything if you use the default gpu-manager image.

@joyme123, I will verify if and post my results to help others in the future hopefully. But if what you say is true, I think the gpu-manger readme file needs to be changed, it says:

GPU manager also supports the payload with fraction resource of GPU device such as 0.1 card or 100MiB gpu device memory. If you want this kind feature, please refer to vcuda-controller project.

which sounds like you need to install the vcuda-controller by yourself.

Thank you very very much for your reply!! I will post results soon

zewenli98 commented 4 years ago

@raz-bn vcuda-controller is for this feature

GPU manager also supports the payload with fraction resource of GPU device such as 0.1 card or 100MiB gpu device memory. If you want this kind feature, please refer to vcuda-controller project.

the default gpu-manager image is built with vcuda-controller, so you don't need to do anything if you use the default gpu-manager image.

@joyme123, I will verify if and post my results to help others in the future hopefully. But if what you say is true, I think the gpu-manger readme file needs to be changed, it says:

GPU manager also supports the payload with fraction resource of GPU device such as 0.1 card or 100MiB gpu device memory. If you want this kind feature, please refer to vcuda-controller project.

which sounds like you need to install the vcuda-controller by yourself.

Thank you very very much for your reply!! I will post results soon

@raz-bn Hi, bro. Do you figure it out? I'm also confused by the README file and cannot use fractional gpus. The first question is how to use the project on earth? The second is whether vcuda-controller project should be used if I would like to use 0.5 gpu?

Could u give more details or your valuable experience about GPU Manager, GPU admission and vcuda-controller? Any help would be appreciated!!!

raz-bn commented 4 years ago

@Servon-Lee hey! First of all, this project (GaiaGPU )and all of its components (GPU Manager, GPU admission, and vcuda-controller) are great! Really innovative but also poorly documented, in my opinion (wasted tons of time while trying to fit it to my use case, more docs would save me plenty of time).

They do provide a paper, which is useful to understand the underlying concepts of this project, but it is not nearly enough to understand the implementation.

vcuda-controller This component is a wrapper to CUDA libraries. This wrapper "catch" memory allocation calls and limit them according to the pod limitation (The how and why it is working is the fun part). Still, you don't need to do anything with the vcuda-controller since it has already compiled for you when deploying the GPU-manager project.

GPU Manager This component has few parts in it (not going to explain them all). But it is the one which makes sure your pods have the vcuda-controller in them.

GPU admission This component is a scheduler extender, and its primary role is to make sure there are no GPU fragmentation issues.

If you want to deploy this project on your cluster you only need to deploy the GPU-manager and the GPU-admission unless you have only 1 GPU card (testbed env) and then you only need the GPU Manager since GPU fragmentation is not an issue

zewenli98 commented 4 years ago

@raz-bn Thanks for your detailed explanation and it's very useful to me. But when I config GPU admission, I have to change scheduler's policy in step 2.2, so I use this command kube-scheduler --policy-config-file=scheduler-policy-config.json --use-legacy-policy-config=true. However, this error popped up and made me confused:

image

By the way, the 10251 port is listened by default kube-scheduler.

My intuition tells me that killing the process and then running the command is not a good idea (I actually did this but had no use). I have no idea about how to correctly run kube-scheduler --policy-config-file=scheduler-policy-config.json --use-legacy-policy-config=true. Did you encounter the similar situation? Looking forward to your reply. Thanks a ton!

raz-bn commented 4 years ago

@Servon-Lee I'm not sure about the command (i did it manually on my master node) but I don't believe it is the problem. I'm order to deploy it correctly you first need to deploy the GPU admission as a Deployment, and set up a service account and service with cluster IP. Then in the scheduler config file in you need to put the service's cluster IP (since the scheduler will communicate with the extender via this address) local host will only work if you deploy the gpu admission on the same node as the kube scheduler (if you do it make sure you replicate it on all the masters) .

zewenli98 commented 4 years ago

i did it manually on my master node

I wonder how did you manually do that? I always got the error failed to create listener: failed to listen on 0.0.0.0:10251: listen tcp 0.0.0.0:10251: bind: address already in use

raz-bn commented 4 years ago

@Servon-Lee I guess you are getting this error since the port you are trying to bind is already in use.

zewenli98 commented 4 years ago

@Servon-Lee I guess you are getting this error since the port you are trying to bind is already in use.

Yes, it is used by the default kube scheduler, but I don't know what to do next.

raz-bn commented 4 years ago

@Servon-Lee

  1. clone the GPU-admission
  2. build and create a docker image & push it to your docker registry
  3. create GPU-admission pod using the YAML file in the repo ( I recommend you to make it a deployment instead)
  4. create a service witch cluster IP (target port should be the same as the GPU-admission listening port the default is 3456)
  5. modify the scheduler-policy-config.json so the urlPrefix field will match the service ClusterIP and port
  6. find out how to load this scheduler policy ( I am using Red Hat Openshift, so it is a different process for you I guess)

*note If you modify scheduler-policy without setting up the extender first, you will not be able to schedule any pods in your cluster. This is the case with any scheduler extender.

zewenli98 commented 4 years ago

4. create a service witch cluster IP (target port should be the same as the GPU-admission listening port the default is 3456)

@raz-bn Thank you soooooo much. But I'm still not sure how to create a service. Do you mean systemctl start gpu-admission.service and systemctl enable gpu-admission.service?

Besides, when running kubectl apply -f gpu-admission.yaml, I got the CrashLoopBackOff as figure below:

image

image

Is there anything wrong?

raz-bn commented 4 years ago

@Servon-Lee It seems like you are not really familiar with k8s, I suggest you to read (or watch some videos) about it and it's core concepts it will really help you out deploying this project. I was talking about k8s service.

Now about the error you get, I pretty sure in your YAML file you have this env:

    - name: EXTRA_FLAGS
      value: "--incluster-mode=false"

and it supposed to be --incluster-mode=true since you deployed it as a pod in your cluster 😃

zewenli98 commented 4 years ago

@raz-bn Thank you bro.🤝 I'm new around here. I think it's time to learn k8s systematically.

zewenli98 commented 4 years ago

@raz-bn Hi bro, sorry to bother you but I'm really eager to use this function. Could you please provide all the prerequisites in order to deploy gpu-manager and gpu-admission, such as the service file of gpu-admission. Thanks a lot!

xs233 commented 4 years ago

+1

raz-bn commented 4 years ago

@Servon-Lee @xs233

apiVersion: v1
kind: ServiceAccount
metadata:
  name:  gpu-admission
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: gpu-admission
spec:
  replicas: 1
  selector:
    matchLabels:
      app: gpu-admission
  template:
    metadata:
      labels:
        app: gpu-admission
    spec:
      containers:
      - name: gpu-admission
        image: <Image>
        env:
        - name: LOG_LEVEL
          value: '5'
        - name: EXTRA_FLAGS
          value: '--incluster-mode=true'        
        securityContext:
          privileged: true
        ports:
          - hostPort: 3456
            containerPort: 3456
            protocol: TCP
        volumeMounts:
          - name: kubernetes
            readOnly: true
            mountPath: /etc/kubernetes/
          - name: log
            mountPath: /var/log/gpu-admission
      serviceAccount: gpu-admission
      volumes:
        - name: kubernetes
          configMap:
            name: gpu-admission.config
            defaultMode: 420
        - name: log
          hostPath:
            path: /var/log/gpu-admission
            type: DirectoryOrCreate
---
apiVersion: v1
kind: Service
metadata:
  name: gpu-admission
spec:
  selector:
    app: gpu-admission
  ports:
    - name: 3456-tcp
      protocol: TCP
      port: 3456
      targetPort: 3456
  type: ClusterIP

This should work but it is not production-ready by any means, make sure to add your image instead of the place holder. After creating you will need to get the cluster IP assigned to the service and put it in the scheduler policy in the urlPrefix field.

zewenli98 commented 4 years ago

@raz-bn Reaaaaally appreciate!

After creating you will need to get the cluster IP assigned to the service and put it in the scheduler policy in the urlPrefix field.

This is very crucial!

ZinuoCai commented 3 years ago

@Servon-Lee @raz-bn Thanks for your dicussion! I want to depoly it with the above yaml file. I wonder where is the gpu-admission.config. Can I delete the following configuration?

      volumes:
        - name: kubernetes
          configMap:
            name: gpu-admission.config
            defaultMode: 420