pabloromeo / clusterplex

ClusterPlex is an extended version of Plex, which supports distributed Workers across a cluster to handle transcoding requests.
MIT License
452 stars 35 forks source link

Drivers Missing and HW Transcoding device missing #321

Closed DizzieNight closed 3 months ago

DizzieNight commented 3 months ago

Describe the bug When transcoding the drivers needed are missing and my Intel iGPU is missing from the plex hw transcode dropdown, although it was there when I setup clusterplex a few weeks ago.

"/config/Library/Application Support/Plex Media Server/Cache/va-dri-linux-x86_64" is empty.

To Reproduce Steps to reproduce the behavior: Not really sure, delete your va-dri-linux-x86_64 folder I guess?

Expected behavior Worker should find drivers and execute transcoding.

Screenshots image

Desktop (please complete the following information):

Additional context On Kubernetes cluster with 4 nodes (2 masters and 2 workers). I setup clusterplex to remote transcode all on the worker.

pabloromeo commented 3 months ago

Hi! Not sure if you had it working before on your workers, but for hw transcoding you're probably going to need to have network shares between PMS and the workers for both Cache and Drivers paths.

A working config for the Helm install can be seen here:

https://github.com/pabloromeo/clusterplex/issues/305#issue-2267787081

Is that how you have it set up too?

DizzieNight commented 3 months ago

Yeah I have the same additionalMediaVolumes setup. They are 2 separate longhorn volumes which I can see has some data in it and it's being shared. I noticed on the pms container there aren't any drivers in there either, so it's not like the worker is the only one missing it. Both containers are. I have updated to the latest 1.1.7 helm chart.

Below is my helm chart values (not sure why it isn't formatting properly but ah well'

` values: global:

-- Configure the plex image that will be used for the PMS and Worker components

  # @default -- See below
  plexImage:
    # -- The image that will be used
    repository: linuxserver/plex

    # -- The image tag to use
    tag: latest

    # -- Defines when the image should be pulled. Options are Always (default), IfNotPresent, and Never
    imagePullPolicy: Always

  # -- The CluterPlex version of docker mod images to pull
  # @default -- The appVersion for this chart
  clusterplexVersion: 

  # -- The timezone configured for each pod
  timezone: Australia/Melbourne

  # -- The process group ID that the LinuxServer Plex container will run Plex/Worker as.
  PGID: 1000

  # -- The process user ID that the LinuxServer Plex container will run Plex/Worker as.
  PUID: 1000

  sharedStorage:
    # -- Configure the volume that will be mounted to the PMS and worker pods for a shared location for transcoding files.
    # @default -- See below
    transcode:
      # -- Enable or disable the transcode PVC. This should only be disabled if you are not using the workers.
      enabled: true

      # -- Storage class for the transcode volume.
      # If set to `-`, dynamic provisioning is disabled.
      # If set to something else, the given storageClass is used.
      # If undefined (the default) or set to null, no storageClassName spec is set, choosing the default provisioner.
      # NOTE: This class must support ReadWriteMany otherwise you will encounter errors.
      storageClass:  # "-"

      # -- If you want to reuse an existing claim, the name of the existing PVC can be passed here.
      existingClaim: pvc-smb-transcodes # your-claim

      # -- Used in conjunction with `existingClaim`. Specifies a sub-path inside the referenced volume instead of its root
      subPath: # some-subpath

      # -- The size of the transcode volume.
      size: 10Gi

      # -- Set to true to retain the PVC upon `helm uninstall`
      retain: true

    # -- Configure the media volume that will contain all of your media. If you need more volumes you need to add them under
    # the pms and worker sections manually. Those volumes must already be present in the cluster.
    # @default -- See below
    media:
      # -- Enables or disables the volume
      enabled: true

      # -- Storage Class for the config volume.
      # If set to `-`, dynamic provisioning is disabled.
      # If set to something else, the given storageClass is used.
      # If undefined (the default) or set to null, no storageClassName spec is set, choosing the default provisioner.
      # NOTE: This class must support ReadWriteMany otherwise you will encounter errors.
      storageClass: # "-"

      # -- If you want to reuse an existing claim, the name of the existing PVC can be passed here.
      existingClaim: pvc-nfs-mediadata-plex # your-claim

      # -- Used in conjunction with `existingClaim`. Specifies a sub-path inside the referenced volume instead of its root
      subPath: # some-subpath

      # -- The amount of storage that is requested for the persistent volume.
      size: 200Ti

      # -- Set to true to retain the PVC upon `helm uninstall`
      retain: true

    # -- Use this section to add additional media mounts if necessary. You can copy the contents of the above media
    additionalMediaVolumes:
      drivers:
        enabled: true
        existingClaim: plex-drivers
        mountPath: /config/Library/Application Support/Plex Media Server/Drivers

      cache:
        enabled: true
        existingClaim: plex-cache
        mountPath: /config/Library/Application Support/Plex Media Server/Cache

# -- Configure the Plex Media Server component
# @default -- See below
pms:
  # -- Enable or disable the Plex Media Server component
  enabled: true

  # -- Additional environment variables. Template enabled.
  # Syntax options:
  # A) TZ: UTC
  # B) PASSWD: '{{ .Release.Name }}'
  # C) PASSWD:
  #      configMapKeyRef:
  #        name: config-map-name
  #        key: key-name
  # D) PASSWD:
  #      valueFrom:
  #        secretKeyRef:
  #          name: secret-name
  #          key: key-name
  #      ...
  # E) - name: TZ
  #      value: UTC
  # F) - name: TZ
  #      value: '{{ .Release.Name }}'
  securityContext:
    privileged: true
  env:
    #FFMPEG_HWACCEL: vaapi
    TRANSCODE_OPERATING_MODE: remote
  # -- Supply the configuration items used to configure the PMS component
  # @default -- See below
  config:
    # -- Set this to 1 if you want only info logging from the transcoder or 0 if you want debugging logs
    transcoderVerbose: 1

    # -- Set the transcode operating mode. Valid options are local (No workers), remote (only remote workers), both (default, remote first then local if remote fails).
    # If you disable the worker then this will be set to local automatically as that is the only valid option for that confguration.
    transcodeOperatingMode: remote

    # -- Set the Plex claim token obtained from https://plex.tv/claim
    plexClaimToken: "claim-aaarwUUyn8xkfbkTfFby"

    # -- Set the version of Plex to use. Valid options are docker, latest, public, or a specific version.
    # [[ref](https://github.com/linuxserver/docker-plex#application-setup)]
    version: docker

    # -- The port that Plex will listen on
    port: 32400

    # -- Enable or disable the local relay function. In most cases this should be left to the default (true).
    # If you disable this, you must add the pod IP address of each worker or the pod network CIDR to Plex under the
    # `List of IP addresses and networks that are allowed without auth` option in Plex's network configuration.
    localRelayEnabled: true

    # -- The port that the relay service will listen on
    relayPort: 32499

    # -- The IP address that plex is using. This is only utilized if you disable the localRelayEnabled option above.
    pmsIP: ""

  # -- Configure the kubernetes service associated with the the PMS component
  # @default -- See below
  serviceConfig:
    # Configure the type of service
    type: LoadBalancer

    # -- Specify the externalTrafficPolicy for the service. Options: Cluster, Local
    # [[ref](https://kubernetes.io/docs/tutorials/services/source-ip/)]
    externalTrafficPolicy:

    # -- Provide additional annotations which may be required.
    annotations: {}

    # -- Provide additional labels which may be required.
    labels: {}

  # -- Configure the ingress for plex here.
  # @default -- See below
  ingressConfig:
    # -- Enables or disables the ingress
    enabled: false

    # -- Provide additional annotations which may be required.
    annotations:
      {}
      # kubernetes.io/ingress.class: nginx
      # kubernetes.io/tls-acme: "true"

    # -- Provide additional labels which may be required.
    labels: {}

    # -- Set the ingressClass that is used for this ingress.
    ingressClassName: # "nginx"

    ## Configure the hosts for the ingress
    hosts:
      - # -- Host address. Helm template can be passed.
        host: chart-example.local
        ## Configure the paths for the host
        paths:
          - # -- Path.  Helm template can be passed.
            path: /
            pathType: Prefix
            service:
              # -- Overrides the service name reference for this path
              name:
              # -- Overrides the service port reference for this path
              port:

    # -- Configure TLS for the ingress. Both secretName and hosts can process a Helm template.
    tls: []
    #  - secretName: chart-example-tls
    #    hosts:
    #      - chart-example.local

  # -- Configure the volume that stores all the Plex configuration and metadata
  # @default -- See below
  configVolume:
    # -- Enables or disables the volume
    enabled: true

    # -- Storage Class for the config volume.
    # If set to `-`, dynamic provisioning is disabled.
    # If set to something else, the given storageClass is used.
    # If undefined (the default) or set to null, no storageClassName spec is set, choosing the default provisioner.
    storageClass: # "-"

    # -- If you want to reuse an existing claim, the name of the existing PVC can be passed here.
    existingClaim: plex # your-claim

    # -- Used in conjunction with `existingClaim`. Specifies a sub-path inside the referenced volume instead of its root
    subPath: # some-subpath

    # -- AccessMode for the persistent volume.
    # Make sure to select an access mode that is supported by your storage provider!
    # [[ref]](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes)
    accessMode: ReadWriteOnce

    # -- The amount of storage that is requested for the persistent volume.
    size: 120Gi

    # -- Set to true to retain the PVC upon `helm uninstall`
    retain: true

  # -- Enable or disable the various health check probes for this component
  # @default -- See below
  healthProbes:
    # -- Enable or disable the startup probe
    startup: false

    # -- Enable or disable the readiness probe
    readiness: false

    # -- Enable or disable the liveness probe
    liveness: false

  # -- Configure the resource requests and limits for the PMS component
  # @default -- See below
  resources:
    requests:
      # -- CPU Request amount
      cpu: 2000m

      # Memory Request Amount
      memory: 2Gi

    #limits:
    #  # -- CPU Limit amount
    #  cpu: 4000m

#

-- Memory Limit amount

    #  memory: 4Gi

# -- Configure the orchestrator component
# @default -- See below
orchestrator:
  # -- Enable or disable the Orchestrator component
  enabled: true

  image:
    # -- image repository
    repository: ghcr.io/pabloromeo/clusterplex_orchestrator

    # -- image pull policy
    pullPolicy: IfNotPresent

  # -- Additional environment variables. Template enabled.
  # Syntax options:
  # A) TZ: UTC
  # B) PASSWD: '{{ .Release.Name }}'
  # C) PASSWD:
  #      configMapKeyRef:
  #        name: config-map-name
  #        key: key-name
  # D) PASSWD:
  #      valueFrom:
  #        secretKeyRef:
  #          name: secret-name
  #          key: key-name
  #      ...
  # E) - name: TZ
  #      value: UTC
  # F) - name: TZ
  #      value: '{{ .Release.Name }}'
  env:
    - name: FFMPEG_HWACCEL
      value: "true"
    - name: FFMPEG_HWACCEL_DEVICE
      value: "/dev/dri/renderD128"

  # -- Supply the configuration items used to configure the Orchestrator component
  # @default -- See below
  config:
    # -- The port that the Orchestrator will listen on
    port: 3500

    # -- Configures how the worker is chosen when a transcoding job is initiated.
    # Options are LOAD_CPU, LOAD_TASKS, RR, and LOAD_RANK (default).
    # [[ref]](https://github.com/pabloromeo/clusterplex/tree/master/docs#orchestrator)
    workerSelectionStrategy: LOAD_RANK

  # -- Configure the kubernetes service associated with the the PMS component
  # @default -- See below
  serviceConfig:
    # -- Configure the type of service
    type: ClusterIP

    # -- Specify the externalTrafficPolicy for the service. Options: Cluster, Local
    # [[ref](https://kubernetes.io/docs/tutorials/services/source-ip/)]
    externalTrafficPolicy:

    # -- Provide additional annotations which may be required.
    annotations: {}

    # -- Provide additional labels which may be required.
    labels: {}

  # -- Configure a ServiceMonitor for use with Prometheus monitoring
  # @default -- See below
  prometheusServiceMonitor:
    # -- Enable the ServiceMonitor creation
    enabled: false

    # -- Provide additional additions which may be required.
    annotations: {}

    # -- Provide additional labels which may be required.
    labels: {}

    # -- Provide a custom selector if desired. Note that this will take precedent over the default
    # method of using the orchestrators namespace. This usually should not be required.
    customSelector: {}

    # -- Configure how often Prometheus should scrape this metrics endpoint in seconds
    scrapeInterval: 30s

    # -- Configure how long Prometheus should wait for the endpoint to reply before
    # considering the request to have timed out.
    scrapeTimeout: 10s

  # -- Configures if the Grafana dashboard for the orchestrator component is deployed to the cluster or not.
  # If enabled, this creates a ConfigMap containing the dashboard JSON so that your Gradana instance can detect it.
  # This requires your grafana instance to have the grafana.sidecar.dashboards.enabled to be true and the searchNamespace
  # to be set to ALL otherwise this will not be discovered.
  enableGrafanaDashboard: false

  # -- Enable or disable the various health check probes for this component
  # @default -- See below/media/sonarr/tv shows/Superstore/Season 2
  healthProbes:
    # -- Enable or disable the startup probe
    startup: true

    # -- Enable or disable the readiness probe
    readiness: true

    # -- Enable or disable the liveness probe
    liveness: true

  # -- Configure the resource requests and limits for the orchestrator component
  # @default -- See below
  resources:
    requests:
      # -- CPU Request amount
      cpu: 200m

      # Memory Request Amount
      memory: 64Mi

    limits:
      # -- CPU Limit amount
      cpu: 500m

      # -- Memory Limit amount
      memory: 128Mi

# -- Configure the worker component
# @default -- See below
worker:
  # -- Enable or disable the Worker component
  enabled: true

  # -- Additional environment variables. Template enabled.
  # Syntax options:
  # A) TZ: UTC
  # B) PASSWD: '{{ .Release.Name }}'
  # C) PASSWD:
  #      configMapKeyRef:
  #        name: config-map-name
  #        key: key-name
  # D) PASSWD:
  #      valueFrom:
  #        secretKeyRef:
  #          name: secret-name
  #          key: key-name
  #      ...
  # E) - name: TZ
  #      value: UTC
  # F) - name: TZ
  #      value: '{{ .Release.Name }}'
  env:

    FFMPEG_HWACCEL: vaapi
  # -- Supply the configuration items used to configure the worker component
  # @default -- See below
  securityContext:
    privileged: true
  config:
    # -- The number of instances of the worker to run
    replicas: 1

    # -- The port the worker will expose its metrics on for the orchestrator to find
    port: 3501

    # -- The frequency at which workers send stats to the orchestrator in ms
    cpuStatInterval: 10000

    # -- Controls usage of the EasyAudioDecoder 1 = ON (default) and 0 = OFF
    eaeSupport: 1

  # -- Configure the kubernetes service associated with the the PMS component
  # @default -- See below
  serviceConfig:
    # Configure the type of service
    type: ClusterIP

    # -- Specify the externalTrafficPolicy for the service. Options: Cluster, Local
    # [[ref](https://kubernetes.io/docs/tutorials/services/source-ip/)]
    externalTrafficPolicy:

    # -- Provide additional annotations which may be required.
    annotations: {}

    # -- Provide additional labels which may be required.
    labels: {}

  # -- Enable or disable the per-pod volumes that cache the codecs. This saves a great deal of time when starting the workers.
  # @default -- See below
  codecVolumes:
    # -- Enable or disable the creation of the codec volumes
    enabled: true

    # -- Add any extra labels needed
    labels: {}

    # -- Add any extra annotations needed
    annotations: {}

    existingClaim: plex-cache

    # -- AccessMode for the persistent volume.
    # Make sure to select an access mode that is supported by your storage provider!
    # [[ref]](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes)
    #accessMode: ReadWriteOnce

#

-- The size of the volume

    #size: 1Gi

    # -- Storage Class for the codec volumes
    # If set to `-`, dynamic provisioning is disabled.
    # If set to something else, the given storageClass is used.
    # If undefined (the default) or set to null, no storageClassName spec is set, choosing the default provisioner.
    #storageClass:

  # -- Enable or disable the various health check probes for this component
  # @default -- See below
  healthProbes:
    # -- Enable or disable the startup probe
    startup: false

    # -- Enable or disable the readiness probe
    readiness: false

    # -- Enable or disable the liveness probe
    liveness: false

  # -- Configure the resource requests and limits for the worker component
  # @default -- See below
  resources:
    requests:
      # -- CPU Request amount
      cpu: 500m

      # -- Request Intel QSV
      gpu.intel.com/i915: "1"

      # -- Memory Request Amount
      memory: 1Gi

    limits:
      # -- CPU Limit amount
      cpu: 3000m

      # -- Request Intel QSV
      gpu.intel.com/i915: "1"

      # -- Memory Limit amount
      memory: 6Gi

  # -- Configure the affinity rules for the worker pods. This helps prevent multiple worker pods from
  # being scheduled on the same node as another worker pod or as the main plex media server.
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: intel.feature.node.kubernetes.io/gpu
            operator: In
            values:
            - "true"
      # - podAffinityTerm:
      #     labelSelector:
      #       matchLabels:
      #         name: clusterplex-pms
      #     topologyKey: kubernetes.io/hostname
      #   weight: 50
  `
DizzieNight commented 3 months ago

I figured it out kinda. It was because PMS was not on a node with a GPU, so I did an affinity for PMS as well as the worker to be on the same node and my iGPU now appears on Plex.

However I am now getting this error when trying to start a transcode on PMS:

image

DizzieNight commented 3 months ago

Redeployed with the version being latest instead of docker and works now