selkies-project / docker-nvidia-egl-desktop

KDE Plasma Desktop container designed for Kubernetes, supporting OpenGL EGL and GLX, Vulkan, and Wine/Proton for NVIDIA GPUs through WebRTC and HTML5, providing an open-source remote cloud/HPC graphics or game streaming platform.
https://github.com/selkies-project/docker-nvidia-egl-desktop/pkgs/container/nvidia-egl-desktop
Mozilla Public License 2.0
225 stars 49 forks source link

selkies-gstream hit 100% CPU #33

Open mathico2 opened 8 months ago

mathico2 commented 8 months ago

Wishing you a Happy New Year! I've set up a node using Standard_NC16as_T4_v3 in AKS . However, we're encountering issue where the pod runs for a few minutes with CPU usage under 9% for selkies-gstream process , but then it abruptly jumps to over 100%, causing the pod to freeze again and become inaccessible . Please could you advise if there is anything that have been missing while deploying this apps within azure environment

image
ehfd commented 8 months ago

I'm not sure what might be an issue (I simply feel this might be something with the GStreamer backend), but try the new container that will be built in the next 10 minutes because there was a new Selkies-GStreamer release with a new GStreamer version patch.

mathico2 commented 8 months ago

Hello ehfd-please could you provide URL for new release so I can update that image within deployment I guess you refer to this below link

https://github.com/selkies-project/selkies-gstreamer/pkgs/container/selkies-gstreamer%2Fgstreamer

mathico2 commented 8 months ago

@ehfd I used newly created selkies gstream docker image but unfortunately it doesn't even open the apps. Please could you advised which docker image need to push within Azure container registry then use that image for deployment

ehfd commented 8 months ago

No, you can use ghcr.io/selkies-project/nvidia-egl-desktop:22.04 or 20.04. I built a new image with the new release.

mathico2 commented 8 months ago

Ok thank you and I will let you know the outcome

On Fri, Jan 5, 2024, 2:04 PM Seungmin Kim @.***> wrote:

No, you can use ghcr.io/selkies-project/nvidia-egl-desktop:22.04 or 20.04. I built a new image with the new release.

— Reply to this email directly, view it on GitHub https://github.com/selkies-project/docker-nvidia-egl-desktop/issues/33#issuecomment-1879134860, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALJ6AOPXE5PY2JBSM3ZFZFLYNBFENAVCNFSM6AAAAABBNDSFQOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZZGEZTIOBWGA . You are receiving this because you authored the thread.Message ID: @.*** com>

mathico2 commented 8 months ago

I just tried to use new docker image but still encountering same issue and this is deployment yaml file in use for this deployment within AKS : apiVersion: apps/v1 kind: Deployment metadata: name: egl spec: replicas: 1 selector: matchLabels: app: egl template: metadata: labels: app: egl spec: hostname: egl

Uncomment the below line to disable network isolation for WebRTC connectivity, may show an error if disallowed by the cluster

  # hostNetwork: true
  containers:
  - name: egl
    image: cazaw1232conregistry.azurecr.us/nvidia-egl-desktop:22.04
    env:
    - name: TZ
      value: "UTC"
    - name: SIZEW
      value: "1920"
    - name: SIZEH
      value: "1080"
    - name: REFRESH
      value: "60"
    - name: DPI
      value: "96"
    - name: CDEPTH
      value: "24"
    # Keep to default unless you know what you are doing with VirtualGL, `VGL_DISPLAY` should be set to either `egl[n]`, or `/dev/dri/card[n]` only when the device was passed to the container
    #- name: VGL_DISPLAY
    #  value: "egl"
    # Choose either `value:` or `secretKeyRef:` but not both at the same time
    - name: PASSWD
      value: "mypasswd"
    # valueFrom:
    #   secretKeyRef:
    #     name: my-pass
    #     key: my-pass
    # Uncomment this to enable noVNC, disabing selkies-gstreamer and ignoring all its parameters except `BASIC_AUTH_PASSWORD`, which will be used for authentication with noVNC, `BASIC_AUTH_PASSWORD` defaults to `PASSWD` if not provided
    # - name: NOVNC_ENABLE
    #   value: "true"
    # Additional view-only password only applicable to the noVNC interface, choose either `value:` or `secretKeyRef:` but not both at the same time
    # - name: NOVNC_VIEWPASS
    #   value: "mypasswd"
    # valueFrom:
    #   secretKeyRef:
    #     name: my-pass
    #     key: my-pass
    ###
    # selkies-gstreamer parameters, for additional configurations see lines that start with "parser.add_argument" in https://github.com/selkies-project/selkies-gstreamer/blob/master/src/selkies_gstreamer/__main__.py
    ###
    # Change `WEBRTC_ENCODER` to `x264enc`, `vp8enc`, or `vp9enc` if you are using software fallback without allocated GPUs or your GPU doesn't support `H.264 (AVCHD)` under the `NVENC - Encoding` section in https://developer.nvidia.com/video-encode-and-decode-gpu-support-matrix-new
    - name: WEBRTC_ENCODER
      value: "nvh264enc"
    - name: WEBRTC_ENABLE_RESIZE
      value: "false"
    - name: ENABLE_BASIC_AUTH
      value: "true"
    - name: ENABLE_HTTPS_WEB
      value: "false"
    # Volume mount trusted HTTPS certificate to new path for no web browser warnings
    # - name: HTTPS_WEB_CERT
    #   value: /etc/ssl/certs/ssl-cert-snakeoil.pem
    # - name: HTTPS_WEB_KEY
    #   value: /etc/ssl/private/ssl-cert-snakeoil.key
    # Defaults to `PASSWD` if unspecified, choose either `value:` or `secretKeyRef:` but not both at the same time
    # - name: BASIC_AUTH_PASSWORD
    #   value: "mypasswd"
    # valueFrom:
    #   secretKeyRef:
    #     name: my-pass
    #     key: my-pass
    ###
    # Uncomment below to use a TURN server for improved network compatibility
    ###
    # - name: TURN_HOST
    #   value: "turn.example.com"
    # - name: TURN_PORT
    #   value: "3478"
    # Provide only `TURN_SHARED_SECRET` for time-limited shared secret authentication or both `TURN_USERNAME` and `TURN_PASSWORD` for legacy long-term authentication, but do not provide both authentication methods at the same time
    # - name: TURN_SHARED_SECRET
    #   valueFrom:
    #     secretKeyRef:
    #       name: turn-shared-secret
    #       key: turn-shared-secret
    # - name: TURN_USERNAME
    #   value: "username"
    # Choose either `value:` or `secretKeyRef:` but not both at the same time
    # - name: TURN_PASSWORD
    #   value: "mypasswd"
    # valueFrom:
    #   secretKeyRef:
    #     name: turn-password
    #     key: turn-password
    # Change to `tcp` if the UDP protocol is throttled or blocked in your client network, or when the TURN server does not support UDP
    # - name: TURN_PROTOCOL
    #   value: "udp"
    # You need a valid hostname and a certificate from authorities such as ZeroSSL (Let's Encrypt may have issues) to enable this
    # - name: TURN_TLS
    #   value: "false"
    stdin: true
    tty: true
    ports:
    - name: http
      containerPort: 8080
      protocol: TCP
    resources:
      limits:
        memory: 64Gi
        cpu: "16"
        nvidia.com/gpu: 1
      requests:
        memory: 100Mi
        cpu: 100m
    volumeMounts:
    - mountPath: /dev/shm
      name: dshm
    - mountPath: /cache
      name: egl-cache-vol
    - mountPath: /home/user
      name: egl-root-vol
    - mountPath: /dev/dri
      name: drm
  tolerations:
  - key: "sku"
    operator: "Equal"
    value: "gpu"
    effect: "NoSchedule"
  volumes:
  - name: dshm
    emptyDir:
      medium: Memory
  - name: egl-cache-vol
    emptyDir: {}
    # persistentVolumeClaim:
    #   claimName: egl-cache-vol
  - name: egl-root-vol
    emptyDir: {}
    # persistentVolumeClaim:
    #   claimName: egl-root-vol
  - name: drm
    emptyDir: {}
image
ehfd commented 8 months ago

Can you follow the procedures in: https://github.com/selkies-project/selkies-gstreamer#install-the-packaged-version-on-a-standalone-machine-or-cloud-instance outside Kubernetes or any containers in the same VM instance? I want to check if it's a hardware issue or a container issue.

ehfd commented 7 months ago

Similar condition (both on Azure): https://github.com/selkies-project/docker-nvidia-glx-desktop/issues/50

ehfd commented 7 months ago

@justinbowes

Selkies-GStreamer directly goes to NVENC so I doubt this is VirtualGL. Do you have any leads? It's just because you were also using Azure's GPUs.

justinbowes commented 7 months ago

@mathico2 On Azure, for virtual workstation applications you might try GRID driver (which is supported on the T4 instances -- see the exception here https://learn.microsoft.com/en-us/azure/virtual-machines/linux/n-series-driver-setup ).

Beyond that, I'd be trying to isolate the layer in which the issue occurs. What is the output of nvidia-smi encodersessions and nvidia-smi dmon, outside of containers, while this is happening?

Also worth checking the kernel messages to see if the driver is complaining.

mathico2 commented 7 months ago

Thanks, I'll check it out.

On Fri, Feb 2, 2024, 7:57 PM Justin Bowes @.***> wrote:

@mathico2 https://github.com/mathico2 On Azure, for virtual workstation applications you might try GRID driver (which is supported on the T4 instances -- see the exception here https://learn.microsoft.com/en-us/azure/virtual-machines/linux/n-series-driver-setup ).

Beyond that, I'd be trying to isolate the layer in which the issue occurs. What is the output of nvidia-smi encodersessions and nvidia-smi dmon while this is happening?

Also worth checking the kernel messages to see if the driver is complaining.

— Reply to this email directly, view it on GitHub https://github.com/selkies-project/docker-nvidia-egl-desktop/issues/33#issuecomment-1924497471, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALJ6AOOUBMGCUY2NRB7RH3DYRUZH3AVCNFSM6AAAAABBNDSFQOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRUGQ4TONBXGE . You are receiving this because you were mentioned.Message ID: @.*** com>

ehfd commented 7 months ago

I agree that there should be comparison inside and outside the container to progress further.

ehfd commented 6 months ago

@mathico2 Any follow ups?

mathico2 commented 6 months ago

Hello Still not able to have that work.

/r

On Tue, Mar 12, 2024, 8:18 AM Seungmin Kim @.***> wrote:

@mathico2 https://github.com/mathico2 Any follow ups?

— Reply to this email directly, view it on GitHub https://github.com/selkies-project/docker-nvidia-egl-desktop/issues/33#issuecomment-1991520701, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALJ6AOKAV6QJOOGQ5WTLYXLYX3XADAVCNFSM6AAAAABBNDSFQOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJRGUZDANZQGE . You are receiving this because you were mentioned.Message ID: @.*** com>

ehfd commented 5 months ago

https://github.com/selkies-project/selkies-gstreamer#install-the-packaged-version-on-a-standalone-machine-or-cloud-instance

I need you to follow the procedures here outside of a container within the same instance to continue.

mathico2 commented 5 months ago

Ok, I will do it.

On Sun, Mar 24, 2024 at 11:48 AM Seungmin Kim @.***> wrote:

https://github.com/selkies-project/selkies-gstreamer#install-the-packaged-version-on-a-standalone-machine-or-cloud-instance

I need you to follow the procedures here outside of a container within the same instance to continue.

— Reply to this email directly, view it on GitHub https://github.com/selkies-project/docker-nvidia-egl-desktop/issues/33#issuecomment-2016850096, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALJ6AOJPLC5OGAQN53IV2QTYZ3YUXAVCNFSM6AAAAABBNDSFQOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJWHA2TAMBZGY . You are receiving this because you were mentioned.Message ID: @.*** com>

ehfd commented 4 months ago

Leads to: https://github.com/python-xlib/python-xlib/pull/242

ehfd commented 4 months ago

Just to confirm: did you happen to use an international keyboard layout? @mathico2

mathico2 commented 4 months ago

No

On Sun, May 5, 2024, 9:15 AM Seungmin Kim @.***> wrote:

Just to confirm: did you happen to use an international keyboard layout? @mathico2 https://github.com/mathico2

— Reply to this email directly, view it on GitHub https://github.com/selkies-project/docker-nvidia-egl-desktop/issues/33#issuecomment-2094806480, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALJ6AOJVFSGL4TT6HJ2IWX3ZAYWIRAVCNFSM6AAAAABBNDSFQOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJUHAYDMNBYGA . You are receiving this because you were mentioned.Message ID: @.*** com>

ehfd commented 2 months ago

@mathico2 We have a new series of containers.

ehfd commented 2 months ago

@mathico2 We have a new release.

mathico2 commented 2 months ago

Cool, I'll check it out.

On Tue, Jun 25, 2024, 12:17 PM Seungmin Kim @.***> wrote:

@mathico2 https://github.com/mathico2 We have a new release.

— Reply to this email directly, view it on GitHub https://github.com/selkies-project/docker-nvidia-egl-desktop/issues/33#issuecomment-2189385549, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALJ6AOO7REY6VW6Y73PZHUDZJGJZTAVCNFSM6AAAAABBNDSFQOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBZGM4DKNJUHE . You are receiving this because you were mentioned.Message ID: @.*** com>

mathico2 commented 2 months ago

Could I get link for github repos

On Tue, Jun 25, 2024, 12:18 PM Bernard Chiepodeu @.***> wrote:

Cool, I'll check it out.

On Tue, Jun 25, 2024, 12:17 PM Seungmin Kim @.***> wrote:

@mathico2 https://github.com/mathico2 We have a new release.

— Reply to this email directly, view it on GitHub https://github.com/selkies-project/docker-nvidia-egl-desktop/issues/33#issuecomment-2189385549, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALJ6AOO7REY6VW6Y73PZHUDZJGJZTAVCNFSM6AAAAABBNDSFQOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBZGM4DKNJUHE . You are receiving this because you were mentioned.Message ID: @.*** com>

ehfd commented 2 months ago

Containers: https://github.com/selkies-project/docker-nvidia-egl-desktop/pkgs/container/nvidia-egl-desktop https://github.com/selkies-project/docker-nvidia-glx-desktop/pkgs/container/nvidia-glx-desktop

Standalone: https://github.com/selkies-project/selkies-gstreamer/releases/tag/v1.6.0

mathico2 commented 2 months ago

Thanks, I'll check them out.

On Tue, Jun 25, 2024 at 12:24 PM Seungmin Kim @.***> wrote:

Containers:

https://github.com/selkies-project/docker-nvidia-egl-desktop/pkgs/container/nvidia-egl-desktop

https://github.com/selkies-project/docker-nvidia-glx-desktop/pkgs/container/nvidia-glx-desktop

Standalone: https://github.com/selkies-project/selkies-gstreamer/releases/tag/v1.6.0

— Reply to this email directly, view it on GitHub https://github.com/selkies-project/docker-nvidia-egl-desktop/issues/33#issuecomment-2189400108, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALJ6AONJWT7LVV3IFAC4TJ3ZJGKSBAVCNFSM6AAAAABBNDSFQOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBZGQYDAMJQHA . You are receiving this because you were mentioned.Message ID: @.*** com>

-- chris

mathico2 commented 2 months ago

Please could I get aks yaml file that was in use for this deployment

On Tue, Jun 25, 2024 at 12:46 PM Bernard Chiepodeu @.***> wrote:

Thanks, I'll check them out.

On Tue, Jun 25, 2024 at 12:24 PM Seungmin Kim @.***> wrote:

Containers:

https://github.com/selkies-project/docker-nvidia-egl-desktop/pkgs/container/nvidia-egl-desktop

https://github.com/selkies-project/docker-nvidia-glx-desktop/pkgs/container/nvidia-glx-desktop

Standalone: https://github.com/selkies-project/selkies-gstreamer/releases/tag/v1.6.0

— Reply to this email directly, view it on GitHub https://github.com/selkies-project/docker-nvidia-egl-desktop/issues/33#issuecomment-2189400108, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALJ6AONJWT7LVV3IFAC4TJ3ZJGKSBAVCNFSM6AAAAABBNDSFQOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBZGQYDAMJQHA . You are receiving this because you were mentioned.Message ID: @.*** com>

-- chris

-- chris

mathico2 commented 2 months ago

I'm referring to egl.yml file

On Tue, Jun 25, 2024 at 12:54 PM Bernard Chiepodeu @.***> wrote:

Please could I get aks yaml file that was in use for this deployment

On Tue, Jun 25, 2024 at 12:46 PM Bernard Chiepodeu @.***> wrote:

Thanks, I'll check them out.

On Tue, Jun 25, 2024 at 12:24 PM Seungmin Kim @.***> wrote:

Containers:

https://github.com/selkies-project/docker-nvidia-egl-desktop/pkgs/container/nvidia-egl-desktop

https://github.com/selkies-project/docker-nvidia-glx-desktop/pkgs/container/nvidia-glx-desktop

Standalone: https://github.com/selkies-project/selkies-gstreamer/releases/tag/v1.6.0

— Reply to this email directly, view it on GitHub https://github.com/selkies-project/docker-nvidia-egl-desktop/issues/33#issuecomment-2189400108, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALJ6AONJWT7LVV3IFAC4TJ3ZJGKSBAVCNFSM6AAAAABBNDSFQOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBZGQYDAMJQHA . You are receiving this because you were mentioned.Message ID: @.*** com>

-- chris

-- chris

-- chris

mathico2 commented 2 months ago

Please could you push that docker images into docker hub so I can push into container registry in Azure

On Tue, Jun 25, 2024 at 12:24 PM Seungmin Kim @.***> wrote:

Containers:

https://github.com/selkies-project/docker-nvidia-egl-desktop/pkgs/container/nvidia-egl-desktop

https://github.com/selkies-project/docker-nvidia-glx-desktop/pkgs/container/nvidia-glx-desktop

Standalone: https://github.com/selkies-project/selkies-gstreamer/releases/tag/v1.6.0

— Reply to this email directly, view it on GitHub https://github.com/selkies-project/docker-nvidia-egl-desktop/issues/33#issuecomment-2189400108, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALJ6AONJWT7LVV3IFAC4TJ3ZJGKSBAVCNFSM6AAAAABBNDSFQOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBZGQYDAMJQHA . You are receiving this because you were mentioned.Message ID: @.*** com>

-- chris

ehfd commented 2 months ago

I also updated the egl/xgl.yml together. There have been variables that were changed for the new release but you could compare them and update them.

Docker Hub: Why? Doesn't ghcr.io work the exact same way?

ehfd commented 2 months ago

https://github.com/selkies-project/docker-nvidia-egl-desktop/blob/main/egl.yml https://github.com/selkies-project/docker-nvidia-glx-desktop/blob/main/xgl.yml

ehfd commented 2 months ago

There's a new docker-compose.yml as well.

ehfd commented 2 months ago

I've fixed everything under my knowledge that might cause this.