pabloromeo / clusterplex

ClusterPlex is an extended version of Plex, which supports distributed Workers across a cluster to handle transcoding requests.
MIT License
409 stars 33 forks source link

TypeError crash on worker when attempting to transcode #320

Open albertsj1 opened 1 week ago

albertsj1 commented 1 week ago

Firstly, thank you for your work on this project. I truly appreciate the time and effort you've put into this to offer it publicly for free.

I am trying to get this working on Docker Swarm. I have 5 node Raspberry Pi cluster with latest version of dietPi. Each node has 4G of memory.

Some backstory of an issue I fixed yesterday (in case it's part of the problem) Issue #317 was fixed yesterday and that fixed the missing CLUSTERPLEX_PLEX_CODECS_VERSION; however, I was still getting an error when it tried to download any codecs. The error appeared to be that the wget command to download the codecs was failing because it appeared that the variables were not being interpreted properly.

Codec libzerocodec_decoder.so does not exist. Downloading...
/usr/lib/plexmediaserver/Plex Media Server: line 111:   564 Bus error               wget https://downloads.plex.tv/codecs/${CLUSTERPLEX_PLEX_CODECS_VERSION}/${CLUSTERPLEX_PLEX_CODEC_ARCH}/${codec}.so
Codec libzlib_decoder.so does not exist. Downloading...
/usr/lib/plexmediaserver/Plex Media Server: line 111:   565 Segmentation fault      wget https://downloads.plex.tv/codecs/${CLUSTERPLEX_PLEX_CODECS_VERSION}/${CLUSTERPLEX_PLEX_CODEC_ARCH}/${codec}.so
Codec libzmbv_decoder.so does not exist. Downloading...
/usr/lib/plexmediaserver/Plex Media Server: line 111:   566 Segmentation fault      (core dumped) wget https://downloads.plex.tv/codecs/${CLUSTERPLEX_PLEX_CODECS_VERSION}/${CLUSTERPLEX_PLEX_CODEC_ARCH}/${codec}.so

I did a docker exec -it bash into one of the worker containers. I made a copy of start.sh to tmp_start.sh with the last line removed so it didn't start the app again. I then ran ./tmp_start.sh. It successfully downloaded all of the codecs without error and exited successfully. After that, I re-deployed the plex stack. The workers detected the codecs already existed and appeared to be ready without error for jobs.

Now... the current problem. Any time I try to watch a video, the transcode job is sent to the worker and the worker crashes with the following error:

Received task request
Setting hwaccel to mmal
EAE_ROOT => "/tmp/pms-3a9dbb6b-c249-4e68-bb49-206e1342974d/EasyAudioEncoder"
EAE Support - Spawning EasyAudioEncoder from "/codecs/ad47460-ffe81d9cd51bd27cb3fbbe09-linux-aarch64-standard/EasyAudioEncoder/EasyAudioEncoder/EasyAudioEncoder", cwd => /tmp/pms-3a9dbb6b-c249-4e68-bb49-206e1342974d/EasyAudioEncoder
/app/worker.js:156
                createEAE_PID(childEAE.pid.toString());
                                           ^

TypeError: Cannot read properties of undefined (reading 'toString')
    at Socket.<anonymous> (/app/worker.js:156:32)
    at Emitter.emit (/app/node_modules/@socket.io/component-emitter/index.js:143:20)
    at Socket.emitEvent (/app/node_modules/socket.io-client/build/cjs/socket.js:559:20)
    at Socket.onevent (/app/node_modules/socket.io-client/build/cjs/socket.js:546:18)
    at Socket.onpacket (/app/node_modules/socket.io-client/build/cjs/socket.js:514:22)
    at Emitter.emit (/app/node_modules/@socket.io/component-emitter/index.js:143:20)
    at /app/node_modules/socket.io-client/build/cjs/manager.js:237:18
    at process.processTicksAndRejections (node:internal/process/task_queues:81:21)

Node.js v20.15.0

NOTE: I have hwaccel set to mmal; however, I get the exact same error without it.

The permissions of the video the worker was trying to play as seen from inside the worker node:

-rwxr-xr-x 1 1000 65534 5.2G Sep 25  2023 '/data/media/tv_shows/<redacted>/Season 1/<redacted>.mkv'

My full docker-swarm.yaml file.

pabloromeo commented 1 week ago

Sounds like both issues may be related (the codecs download and the EAE pid). Any chance you could try one of those Pi's with the official raspberry pi OS 64-bit?

albertsj1 commented 1 week ago

I'll give it a shot and respond back. Probably won't get a chance to do that for a couple of days.

pabloromeo commented 1 week ago

No worries. In a few days I'll try to run dietpi on a vm in proxmox to see if I can reproduce the issue, too. It sounds like it might be a networking issue, or a permissions issue, or both. Not sure which yet.