pabloromeo / clusterplex

ClusterPlex is an extended version of Plex, which supports distributed Workers across a cluster to handle transcoding requests.
MIT License
470 stars 36 forks source link

Stream not starting; cwd => "/run/s6/services/plex" or "/config/Library/Application Support/..." #71

Closed lkathke closed 3 years ago

lkathke commented 3 years ago

Hello guys, I am trying now since 6 hours to get the cluster as simple as possible to work, but i doesnt.

Currently I am using this docker-compose.yml (with docker hub image *-amd64-1.2.9): docker-compose.yml.txt

After starting a movie, plex sends the command to the orchestrator:

plex_1               | Dolby, Dolby Digital, Dolby Digital Plus, Dolby TrueHD and the double D symbol are trademarks of Dolby Laboratories.
plex_1               | Calling external transcoder: /app/transcoder.js
plex_1               | ON_DEATH: debug mode enabled for pid [537]
plex_1               | Setting VERBOSE to ON
plex_1               | Sending request to orchestrator on: http://plex-orchestrator:3500
plex_1               | cwd => "/config/Library/Application Support/Plex Media Server/Cache/Transcode/Sessions/plex-transcode-xf4ofurs5sj3hwfvbn81edps-d373fcf8-f964-4000-8949-b3d9a07586ba"
plex_1               | args => ["-codec:0","h264","-codec:1","eac3_eae","-eae_prefix:1","xf4ofurs5sj3hwfvbn81edps_","-ss","0","-noaccurate_seek","-analyzeduration","20000000","-probesize","20000000","-i","/data/movies/Murder Mystery 2019/Murder.Mystery.2019.German.DL.1080p.WebHD.x264-GSG9/Murder.Mystery.2019.German.DL.1080p.WebHD.x264-GSG9.mkv","-map","0:0","-metadata:s:0","language=eng","-codec:0","copy","-filter_complex","[0:1] aresample=async=1:ocl='stereo':rematrix_maxval=0.000000dB:osr=48000[0]","-map","[0]","-metadata:s:1","language=ger","-codec:1","aac","-b:1","256k","-f","dash","-seg_duration","5","-init_seg_name","init-stream$RepresentationID$.m4s","-media_seg_name","chunk-stream$RepresentationID$-$Number%05d$.m4s","-window_size","5","-delete_removed","false","-skip_to_segment","1","-time_delta","0.0625","-manifest_name","http://10.100.10.1:32400/video/:/transcode/session/xf4ofurs5sj3hwfvbn81edps/d373fcf8-f964-4000-8949-b3d9a07586ba/manifest?X-Plex-Http-Pipeline=infinite","-avoid_negative_ts","disabled","-map_metadata","-1","-map_chapters","-1","dash","-map","0:3","-metadata:s:0","language=ger","-codec:0","ass","-f","segment","-segment_format","ass","-segment_time","1","-segment_header_filename","sub-header","-segment_start_number","0","-segment_list","http://10.100.10.1:32400/video/:/transcode/session/xf4ofurs5sj3hwfvbn81edps/d373fcf8-f964-4000-8949-b3d9a07586ba/seglist?stream=subtitles&X-Plex-Http-Pipeline=infinite","-segment_list_type","csv","-segment_list_size","5","-segment_list_separate_stream_times","1","-segment_format_options","ignore_readorder=1","-segment_list_unfinished","1","-fflags","+flush_packets","sub-chunk-%05d","-start_at_zero","-copyts","-vsync","cfr","-y","-nostats","-loglevel","verbose","-loglevel_plex","verbose","-progressurl","http://10.100.10.1:32400/video/:/transcode/session/xf4ofurs5sj3hwfvbn81edps/d373fcf8-f964-4000-8949-b3d9a07586ba/progress"]
plex_1               | env => {"PUID":"1000","PLEX_ARCH":"amd64","HOSTNAME":"28fa04dde4eb","LANGUAGE":"en_US.UTF-8","TRANSCODE_OPERATING_MODE":"both","ORCHESTRATOR_URL":"http://plex-orchestrator:3500","PWD":"/config/Library/Application Support/Plex Media Server/Cache/Transcode/Sessions/plex-transcode-xf4ofurs5sj3hwfvbn81edps-d373fcf8-f964-4000-8949-b3d9a07586ba","PLEX_DOWNLOAD":"https://downloads.plex.tv/plex-media-server-new","PLEX_MEDIA_SERVER_MAX_PLUGIN_PROCS":"6","NVIDIA_DRIVER_CAPABILITIES":"compute,video,utility","PMS_IP":"10.100.10.1","TZ":"Europe/London","PLEX_MEDIA_SERVER_USER":"abc","HOME":"/root","LANG":"en_US.UTF-8","PGID":"1000","TERM":"xterm","PLEX_MEDIA_SERVER_INFO_VENDOR":"Docker","PLEX_MEDIA_SERVER_HOME":"/usr/lib/plexmediaserver","X_PLEX_TOKEN":"efYQfzCaA2Rr_m1nfYab","PLEX_MEDIA_SERVER_INFO_MODEL":"x86_64","SHLVL":"0","LD_LIBRARY_PATH":"/usr/lib","PLEX_MEDIA_SERVER_INFO_PLATFORM_VERSION":"5.11.0-31-generic","LIBVA_DRIVERS_PATH":"/usr/lib/plexmediaserver/lib/dri","PLEX_MEDIA_SERVER_APPLICATION_SUPPORT_DIR":"/config/Library/Application Support","CWD":"/","PATH":"/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin","VERSION":"docker","EAE_ROOT":"/config/Library/Application Support/Plex Media Server/Cache/Transcode/Sessions/EasyAudioEncoder","DEBIAN_FRONTEND":"noninteractive","PLEX_MEDIA_SERVER_INFO_DEVICE":"Docker Container (LinuxServer.io)","FFMPEG_EXTERNAL_LIBS":"/config/Library/Application\\ Support/Plex\\ Media\\ Server/Codecs/73e06c8-3759-linux-x86_64/","TRANSCODER_VERBOSE":"1"}

So the cwd (current working directory) is: /config/Library/Application Support/Plex Media Server/Cache/Transcode/Sessions/plex-transcode-xf4ofurs5sj3hwfvbn81edps-d373fcf8-f964-4000-8949-b3d9a07586ba which is also accessible through worker1, because I'm sharing the config directory with plex and worker1 /docker/plex/live/config:/config

Worker1 is able to start transcoding and outputs its files to /config/Library/Application Support/Plex Media Server/Cache/Transcode/Sessions/plex-transcode-xf4ofurs5sj3hwfvbn81edps-d373fcf8-f964-4000-8949-b3d9a07586ba plex_files (just an example, not that plex-transcode-xf4of... directory)

But the files dont get published by plex to the webbrowser plex not starting

Now I read that we should change the transcoder output to /tmp/transcode or /transcode via the plex ui https://github.com/pabloromeo/clusterplex/issues/41#issuecomment-826893413 But this changes the cwd to "/run/s6/services/plex", so the worker can not start to transcode (the directory does not exists, even it would exist. it would be written to that directory, not to /tmp/transcode

Does someone has an idea, how to get it working? Or does someone has a working docker-compose for testing?

pabloromeo commented 3 years ago

Hi! Yeah, I run all three parts using Docker Compose (on a Swarm), with a file similar to yours. (However, notice that I'm using LinuxServer Plex images as base, and extending them with the ClusterPlex Dockermods). You would probably want to switch the "experimental" label for a stable release, as I tend to try crazy things out in the experimental tag.

Here's mine:

version: '3.8'

services:
  plex:
    image: ghcr.io/linuxserver/plex:latest
    deploy:
      mode: replicated
      replicas: 1
      update_config:
        order: stop-first
      placement:
        constraints:
          - node.labels.type == v4
      resources:
        reservations:
          memory: 350M
        limits:
          cpus: '3'
    environment:
      DOCKER_MODS: "ghcr.io/pabloromeo/clusterplex_dockermod:experimental"
      VERSION: docker
      PUID: 1000
      PGID: 1000
      TZ: America/Argentina/Buenos_Aires
      ORCHESTRATOR_URL: http://plex-orchestrator:3500
      PMS_IP: 192.168.2.1 #plex
      TRANSCODE_OPERATING_MODE: both #(local|remote|both)
      TRANSCODER_VERBOSE: "1"
    healthcheck:
      test: curl -fsS http://localhost:32400/identity > /dev/null || exit 1
      interval: 30s
      timeout: 15s
      retries: 5
      start_period: 120s
    networks:
      - web
    volumes:
      - /mnt/volume1/cluster-data/plex:/config
      - /mnt/volume1/cluster-data/plex-tmp:/tmp
      - /mnt/nfs/cluster-shared/cluster-backups/plex:/backups
      - /media/MediaLibraries:/data
      - /etc/localtime:/etc/localtime:ro
    ports:
      - 32469:32469
      - 32400:32400
      - 32401:32401
      - 3005:3005
      - 8324:8324
      - 1900:1900/udp
      - 32410:32410/udp
      - 32412:32412/udp
      - 32413:32413/udp
      - 32414:32414/udp
    logging:
      driver: loki:latest
      options:
        loki-url: "http://localhost:3100/loki/api/v1/push"

  plex-orchestrator:
    image: ghcr.io/pabloromeo/clusterplex_orchestrator:experimental
    deploy:
      mode: replicated
      replicas: 1
      update_config:
        order: start-first
      resources:
        reservations:
          memory: 50M
    healthcheck:
      test: curl -fsS http://localhost:3500/health > /dev/null || exit 1
      interval: 15s
      timeout: 15s
      retries: 5
      start_period: 30s
    environment:
      TZ: America/Argentina/Buenos_Aires
      STREAM_SPLITTING: "OFF" # ON | OFF (default)
      LISTENING_PORT: 3500
      WORKER_SELECTION_STRATEGY: "LOAD_RANK" # RR | LOAD_CPU | LOAD_TASKS | LOAD_RANK
    networks:
      - web
    volumes:
      - /etc/localtime:/etc/localtime:ro
    ports:
      - 3500:3500
    logging:
      driver: loki:latest
      options:
        loki-url: "http://localhost:3100/loki/api/v1/push"

  plex-worker:
    image: ghcr.io/linuxserver/plex:latest
    hostname: "plex-worker-{{.Node.Hostname}}"
    deploy:
      mode: replicated
      replicas: 2
      placement:
        max_replicas_per_node: 1
        constraints:
          - node.labels.transcoding == true
      update_config:
        order: stop-first
      resources:
        reservations:
          cpus: '0.25'
          memory: 100M
        limits:
          cpus: '3'
          memory: 2048M
    environment:
      DOCKER_MODS: "ghcr.io/pabloromeo/clusterplex_worker_dockermod:experimental"
      VERSION: docker
      PUID: 1000
      PGID: 1000
      TZ: America/Argentina/Buenos_Aires
      LISTENING_PORT: 3501
      STAT_CPU_INTERVAL: 2000
      ORCHESTRATOR_URL: http://plex-orchestrator:3500
    healthcheck:
      test: curl -fsS http://localhost:3501/health > /dev/null || exit 1
      interval: 30s
      timeout: 15s
      retries: 5
      start_period: 240s
    networks:
      - web
    volumes:
      - /mnt/volume1/cluster-data/plex-codecs:/codecs
      - /mnt/volume1/cluster-data/plex-tmp:/tmp
      - /media/MediaLibraries:/data
      - /etc/localtime:/etc/localtime:ro
    logging:
      driver: loki:latest
      options:
        loki-url: "http://localhost:3100/loki/api/v1/push"

networks:
  web:
    external: true

I believe the key factor here, is that both Plex and the Worker are sharing the container's real /tmp directory.

  - /mnt/volume1/cluster-data/plex-tmp:/tmp

That is a network share I share through GlusterFS. And in Plex, the value for the Transcode directory is: image

Also, as you can see there, the Workers don't even need the full Plex config files, it just needs access to the Media in the same path, the temp directory used for transcoding, and the codecs (these don't necessarily need to be shared, i just share them so that they are only downloaded once and reused by all workers).

You may want to enable debug logging in the Plex Server, just to see if you are actually seeing the calls back being made by the Transcoder to your original PMS to report Progress (for example, posting to "-progressurl","http://10.100.10.1:32400/video/:/transcode/session/xf4ofurs5sj3hwfvbn81edps/d373fcf8-f964-4000-8949-b3d9a07586ba/progress").

I believe if you were to share the entire /tmp between them, your problem should go away, because Plex writes stuff into temp aside from just the /tmp/transcode Session content. If you look at your worker's /tmp directory you're probably going to see stuff there. And that might be what is causing the failures on your deployment.

In my case I've loosened permissions on that temp location quite a bit, i've chmod'ed it to +777, which at this time I can't recall if that is completely necessary.

lkathke commented 3 years ago

Okay, I got it working now 🥇 (After using the linuxserver/plex container with the Docker_Mod, it wasn't working at first)

I think my main problems were:

I think the third problem was the reason, why it didn't play the stream in the plex web interface. I was looking into the developer console of google chrome today again and saw that it wanted to access strange ip addresses, like 127.0.0.1:32400 and 192.168.x.1:32400.

Thank you so much for your help! It was very frustrading not getting it to work. So thank you! :)

I will now try to run a worker with nvidia docker mod on my unraid server (which has a nvidia gtx 1070). For that I will try out the worker modification from FCLC https://github.com/FCLC/clusterplex/commit/73471d3837a469288ed608a6ec72f7ee956caa83 I am excited to see, if this works.

For everyone that might facing the same problem, here is my current docker-compose (I am not using docker swarm):

version: '3.8'

services:
  plex:
    image: ghcr.io/linuxserver/plex:latest
    deploy:
      mode: replicated
      replicas: 1
      update_config:
        order: stop-first
      placement:
        constraints:
          - node.labels.type == v4
      resources:
        reservations:
          memory: 350M
        limits:
          cpus: '3'
    environment:
      DOCKER_MODS: "ghcr.io/pabloromeo/clusterplex_dockermod:dev"
      VERSION: docker
      PUID: 1000
      PGID: 1000
      TZ: Europe/Berlin
      ORCHESTRATOR_URL: http://10.100.80.2:3500
      PMS_IP: 10.100.80.1 #plex
      TRANSCODE_OPERATING_MODE: both #(local|remote|both)
      TRANSCODER_VERBOSE: "1"
    healthcheck:
      test: curl -fsS http://localhost:32400/identity > /dev/null || exit 1
      interval: 30s
      timeout: 15s
      retries: 5
      start_period: 120s
    networks:
      lanbridge:
        ipv4_address: 10.100.80.1
    volumes:
      - /docker/plex/config:/config
      - /docker/plex/plex-tmp:/tmp
      - /docker/plex/plex:/backups
      - /mount/media:/data
      - /etc/localtime:/etc/localtime:ro
    ports:
      - 32469:32469
      - 32400:32400
      - 32401:32401
      - 3005:3005
      - 8324:8324
      - 1900:1900/udp
      - 32410:32410/udp
      - 32412:32412/udp
      - 32413:32413/udp
      - 32414:32414/udp

  plex-orchestrator:
    image: ghcr.io/pabloromeo/clusterplex_orchestrator:dev
    deploy:
      mode: replicated
      replicas: 1
      update_config:
        order: start-first
      resources:
        reservations:
          memory: 50M
    healthcheck:
      test: curl -fsS http://localhost:3500/health > /dev/null || exit 1
      interval: 15s
      timeout: 15s
      retries: 5
      start_period: 30s
    environment:
      TZ: Europe/Berlin
      STREAM_SPLITTING: "OFF" # ON | OFF (default)
      LISTENING_PORT: 3500
      WORKER_SELECTION_STRATEGY: "LOAD_RANK" # RR | LOAD_CPU | LOAD_TASKS | LOAD_RANK
    networks:
      lanbridge:
        ipv4_address: 10.100.80.2
    volumes:
      - /etc/localtime:/etc/localtime:ro
    ports:
      - 3500:3500

  plex-worker:
    image: ghcr.io/linuxserver/plex:latest
    hostname: "plex-worker-1"
    deploy:
      mode: replicated
      replicas: 2
      placement:
        max_replicas_per_node: 1
        constraints:
          - node.labels.transcoding == true
      update_config:
        order: stop-first
      resources:
        reservations:
          cpus: '0.25'
          memory: 100M
        limits:
          cpus: '3'
          memory: 2048M
    environment:
      DOCKER_MODS: "ghcr.io/pabloromeo/clusterplex_worker_dockermod:dev"
      VERSION: docker
      PUID: 1000
      PGID: 1000
      TZ: Europe/Berlin
      LISTENING_PORT: 3501
      STAT_CPU_INTERVAL: 2000
      ORCHESTRATOR_URL: http://10.100.80.2:3500
    healthcheck:
      test: curl -fsS http://localhost:3501/health > /dev/null || exit 1
      interval: 30s
      timeout: 15s
      retries: 5
      start_period: 240s
    networks:
      lanbridge:
        ipv4_address: 10.100.80.10
    volumes:
      - /docker/plex/plex-codecs:/codecs
      - /docker/plex/plex-tmp:/tmp
      - /mount/media:/data
      - /etc/localtime:/etc/localtime:ro

networks:
  lanbridge:
    driver: macvlan
    driver_opts:
      parent: enp1s0
    ipam:
      driver: default
      config:
        - subnet: 10.100.0.0/16
pabloromeo commented 3 years ago

Awesome!! Glad you got it working! :D

Technically, there isn't a functional difference between using the linuxserver/plex image +dockermod, compared to the ClusterPlex image, since it also uses it as base. It's just a bit more convenient to use the Dockermod since you can update plex by just pulling latest and not waiting for a new clusterplex tagged release or rebuild or anything like that.

As you mentioned, I get the feeling the /tmp/transcode instead of the /transcode that is documented in the README probably contributed to the problem. I'm gonna update the docker-compose example to use /tmp/transcode just to be safe, for others who might want to run it.

Good luck with the nvidia transcoding test :)