Closed ressu closed 3 years ago
How's this working? I'd love to try this out, but might just end up trying unicorn instead.
I'm using this as my daily driver, so I'm not aware of any issues.
@ressu I'm interested in testing this out too. Other than installing the Helm Chart from your fork, are there different docker images needed to fully utilize your changes? I've tried just installing the helm chart, but it's failing to start transcode pods, and I read that your changes included switching to transcode jobs instead, so I must be missing something. Thanks for this work!
You can use the image I built here https://github.com/ressu/kube-plex/pkgs/container/kube-plex
So the relevant parts of my helm values are:
image:
tag: latest
kubePlex:
enabled: true
image:
repository: ghcr.io/ressu/kube-plex
I get the following error when /shared/kube-plex
binary is invoked to spin up transcode jobs, after using the provided image (which is progress from where I previously was getting š )
Protected process returned an error: exit status 1
I haven't been able to determine where that message is coming from, yet. Any thoughts? I am running K3s at 1.21.1, so not sure if there is something newer in Kubernetes that might be causing this.
Here are the Plex pod logs, and you can see my two attempts to transcode something.
$ kubectl logs plex-kube-plex-5585bc9f59-m49vw
[s6-init] making user provided files available at /var/run/s6/etc...exited 0.
[s6-init] ensuring user provided files have correct perms...exited 0.
[fix-attrs.d] applying ownership & permissions fixes...
[fix-attrs.d] done.
[cont-init.d] executing container initialization scripts...
[cont-init.d] 40-plex-first-run: executing...
Plex Media Server first run setup complete
[cont-init.d] 40-plex-first-run: exited 0.
[cont-init.d] 45-plex-hw-transcode-and-connected-tuner: executing...
[cont-init.d] 45-plex-hw-transcode-and-connected-tuner: exited 0.
[cont-init.d] 50-plex-update: executing...
[cont-init.d] 50-plex-update: exited 0.
[cont-init.d] done.
[services.d] starting services
[services.d] done.
Starting Plex Media Server.
Critical: libusb_init failed
Protected process returned an error: exit status 1
Protected process returned an error: exit status 1
Hmm.. I'm not fully certain if that's relevant to the issue. I have to admit that I've never checked the plex console logs once I got everything working. What I would suggest is enabling logging in kubeplex by adding loglevel verbose to the values (similar to this)
image:
tag: latest
kubePlex:
enabled: true
image:
repository: ghcr.io/ressu/kube-plex
loglevel: verbose
That will make kubeplex log more information. You can see the logs in the plex web UI, by going to settings -> manage -> console and filtering for transcode (if I remember correctly). This will show the logs from kubeplex directly and should give you an idea of what is going wrong.
So I made it a little farther with the assist of the verbose logging (thank you!). What that showed was insufficient permissions with the role kube-plex uses, so I had to update the role with the batch
API group for jobs
resources.
diff --git a/charts/kube-plex/templates/rbac.yaml b/charts/kube-plex/templates/rbac.yaml
index a327770..e59ae87 100644
--- a/charts/kube-plex/templates/rbac.yaml
+++ b/charts/kube-plex/templates/rbac.yaml
@@ -27,6 +27,19 @@ rules:
- patch
- update
- watch
+- apiGroups:
+ - batch
+ resources:
+ - jobs
+ verbs:
+ - create
+ - delete
+ - deletecollection
+ - get
+ - list
+ - patch
+ - update
+ - watch
Now, I'm getting errors with the transcoder launching, and it looks like its maybe related to an ffmpeg flag, maybe?
Jun 21, 2021 10:59:44.793 [0x7f5f47277b38] Debug ā [Transcoder] [AVHWDeviceContext @ 0x7ff9e5845700] Cannot open a VA display from DRM device (null).
Jun 21, 2021 10:59:44.793 [0x7f5f47325b38] Error ā [Transcoder] Device creation failed: -542398533.
Jun 21, 2021 10:59:44.793 [0x7f5f47277b38] Error ā [Transcoder] Failed to set value 'vaapi=vaapi:' for option 'init_hw_device': Generic error in an external library
Jun 21, 2021 10:59:44.794 [0x7f5f47325b38] Error ā [Transcoder] Error parsing global options: Generic error in an external library
Jun 21, 2021 10:59:44.795 [0x7f5f47277b38] Error ā [KubePlexProxy] transcode failed [error:exit status 1]
Jun 21, 2021 10:59:45.545 [0x7f5f47325b38] Info ā [KubePlex] Error waiting for pod to complete: job "pms-elastic-transcoder-2kvxm" failed
I did test the ability for a pod to schedule GPU (using an Ubuntu OpenCL test pod) and it was able to schedule the pod and utilize the GPU successfully.
Oh, that makes sense.. I don't think my cluster has RBAC enabled (I should turn it on though) which explains the missing permissions. I'll add those a bit later in the week to my branch.
I haven't tried my codebase with GPU support, could you send an example of GPU config for me so I know what to look for and pick it up for the transcoder.
I'm running the Intel Device GPU Plugin daemonset configured like the following:
apiVersion: apps/v1
kind: DaemonSet
metadata:
labels:
app: intel-gpu-plugin
name: intel-gpu-plugin
namespace: kube-system
spec:
revisionHistoryLimit: 10
selector:
matchLabels:
app: intel-gpu-plugin
template:
metadata:
creationTimestamp: null
labels:
app: intel-gpu-plugin
spec:
containers:
- env:
- name: NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
image: intel/intel-gpu-plugin:0.21.0
imagePullPolicy: IfNotPresent
name: intel-gpu-plugin
resources: {}
securityContext:
readOnlyRootFilesystem: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /dev/dri
name: devfs
readOnly: true
- mountPath: /sys/class/drm
name: sysfs
readOnly: true
- mountPath: /var/lib/kubelet/device-plugins
name: kubeletsockets
dnsPolicy: ClusterFirst
nodeSelector:
feature.node.kubernetes.io/pci-0300_8086.present: "true"
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
volumes:
- hostPath:
path: /dev/dri
type: ""
name: devfs
- hostPath:
path: /sys/class/drm
type: ""
name: sysfs
- hostPath:
path: /var/lib/kubelet/device-plugins
type: ""
name: kubeletsockets
updateStrategy:
rollingUpdate:
maxSurge: 0
maxUnavailable: 1
type: RollingUpdate
Then the pod resources for requests and limits need to request the GPU resource like so:
resources:
requests:
gpu.intel.com/i915: 1
cpu: 1000m
memory: 1500Mi
limits:
gpu.intel.com/i915: 1
As I was writing this about the resource requests, I am now wondering if it might be that the GPU resource request needs to be added to the job spec that creates the pms-elastic-transcoder-*
job pods in cmd/kube-plex/kubernetes.go
, to make the GPU available to the elastic transcoder pods.
Yeah, the requests needs to be updated. I'm assuming that this is an exclusive lock. As in, there is just a single GPU instance available, which means that the easy solution is to add a mechanism to define additional resource requests/limits in the transcoder job. That should be easy enough to fix.
I'll add that later this week along with the RBAC
Alright, I'm working on the GPU support in this PR https://github.com/ressu/kube-plex/pull/1
The change itself turned out to be a bit more involved than I expected, but should work. I'll do some more testing and rebuild images on my side.
Sigh, renaming master branch to main closed this PR. I'll open a new one shortly..
I'll admit, this is a "scratch my own itch" type of a solution. But when trying to make use of kube-plex, I ended up having a few issues. So I reworked quite a bit of it.
I'm dropping this PR as a heads up that this work has been done, I'm more than happy to try and break it down to smaller chunks for merging. Some changes included (in no specific order):
I'm also dropping
vendor
directory from the repository. Personally I prefer to have it, but it tends to mess up pull requests. I've also constructed all the Dockerfiles etc in a way where it's completely fine to either run everything as is, or by pre-caching the modules withgo mod vendor
.