ressu / kube-plex

Scalable Plex Media Server on Kubernetes -- dispatch transcode jobs as pods on your cluster!
Apache License 2.0
102 stars 24 forks source link

plex issue, though not kube-plex specific, around nfs and /config becoming corrupt #19

Closed karezza closed 2 years ago

karezza commented 2 years ago

I've learned the hard way that plex has issues with using an nfs share as its config folder. The plexinc/pms-docker image notes that the issue is around file locking not being enabled by default with nfs.

In my storageClass nfs mount options I'm using 'hard' and 'nfsvers=4.2', yet I still watch in amazement as my plex server stops working after awhile and seems to only come back to a working condition by deleting the deployment, erasing the config pvc, recreating the pvc, and restarting.

My nfs server and all my cluster nodes are running the latest centos-8-stream.

If you guys are using NFS for your config mount, what mount options are you using (on both server and client side)? If you are using something else, what has worked for you for an onprem storage solution to store your /config on as a network share?

ressu commented 2 years ago

My configuration is on a iSCSI volume (backed by https://github.com/ressu/synology-csi) and aside from CSI isssues, the configuration is stable and works as expected.

But I do recall some mentions of issues if database is on NFS. There are some mentions of locking issues in the discussion there, but if your issues are resolved by simply recreating the PVC (effectively remounting the volume) then locking is an unlikely cause. Locking issues would show up as data corruption instead.

Have you tried rebooting the node? While it is disruptive to the other pods running on the node, it would confirm whether this is a mount issue or an issue somewhere else.

karezza commented 2 years ago

It isn't fixed by remounting, only by deleting the pvc, recreating the pvc, and mounting the new pvc. Definitely "appears to be" corruption. The issues occur within and hour or two of the initial discovery and loading of the available media files into the libraries. I've started over several times with different mount options with no luck.

My main share is off a 2019 windows server that I am mounting on a Linux system which uses it for an nfs export. I then am using that nfs share for dynamic provisioning in the cluster. It seems to work for everything so far, except for plex and nextcloud. I am concerned everything else I have recently set up will all become corrupted as well. Looks like I need a better solution. Looking for ideas though I wasn't expecting a sudden expense...

On Sat, Sep 25, 2021, 4:22 AM Sami Haahtinen @.***> wrote:

My configuration is on a iSCSI volume (backed by https://github.com/ressu/synology-csi) and aside from CSI isssues, the configuration is stable and works as expected.

But I do recall some mentions https://www.reddit.com/r/PleX/comments/ff4a59/plex_hangs_with_library_and_database_on_nfs/ of issues if database is on NFS. There are some mentions of locking issues in the discussion there, but if your issues are resolved by simply recreating the PVC (effectively remounting the volume) then locking is an unlikely cause. Locking issues would show up as data corruption instead.

Have you tried rebooting the node? While it is disruptive to the other pods running on the node, it would confirm whether this is a mount issue or an issue somewhere else.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ressu/kube-plex/issues/19#issuecomment-927100477, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEHXQ7IMV4EHV2OVMFNS2KLUDWPGPANCNFSM5ESWC5HA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

ressu commented 2 years ago

Oh, so you need to delete the data to get everything back running. Got it.

I'm not sure how locking with Windows 2019 works, so you could try setting mountOptions in the persistent volume definition to include local_lock=all. Local locking will override any NLM and handle the locking on the kubernetes node. This should fix the issue, but you should avoid read-write mounting the same volume on multiple machines since locking isn't respected on other machines.

On the upside, the way my fork of kube-plex is built avoids mounting configuration on anything but the main Plex pod. I did this because multi-mounting an ext4 filesystem isn't supported and I was too lazy to seek out other alternatives 😆