hasse69 / rar2fs

FUSE file system for reading RAR archives
https://hasse69.github.io/rar2fs/
GNU General Public License v3.0
272 stars 26 forks source link

Possible non-rar2fs issue - Need advice #173

Closed zeeter82 closed 2 years ago

zeeter82 commented 2 years ago

So here recently I'm running into another issue where Plex will start it's scheduled scans of my media pointed to my rar2fs mounts. After doing lots of troubleshooting and looking at Plex logs, it appears that a lot of time it will randomly start getting errors in scanning the files and then basically an entire path or most of an entire path will be marked "unavailable" even though I can still browse to the file and playback the file just fine in Plex. It just marks it as "unavailable" and puts the trash can icon on the item because it had issues during the scan/re-scan. So in theory this technically isn't breaking anything because the item is still playable in Plex, but I get left with a ton of items in my library with the trash can icon which is very annoying.....very hard on my sysadmin OCD. It also makes it to where I can't empty the trash in Plex because I don't want to accidentally remove anything and/or affect my DB with playback history etc.

I see a ton of these in my "Plex Media Server.log": WARN - Error scanning directory, we'll skip and continue: boost::filesystem::directory_iterator::construct: An unexpected network error occurred:

I also see a few of these in my syslog on VM: smbd[17617]: Too many open files, unable to open more! smbd's max open files = 16424

I'm just curious if maybe this is a file limits issue in my VM running rar2fs. Or maybe this is another issue entirely on my Windows box where I'm running my Plex server...something going on with Windows networking etc.

Any ideas? I posted something in the Plex Discord about this yesterday, but didn't get any replies yet.

hasse69 commented 2 years ago

Thanks for the update. Feel free to close this issue if and when you consider it solved from your end.

zeeter82 commented 2 years ago

This seems to be solved for me and running well again. Closing issue.

zeeter82 commented 2 years ago

This might be a lost cause because it very well could be Plex and or VM/smb issue, but this just starting happening again today. I've made no changes to any configs except staying current with latest Plex builds which I last updated over a week ago. Media playback is technically fine "so far". It's only the annoyance that when looking at Plex in the web view from a browser, a ton of my collection has the "trash can" saying it's unavailable....this isn't true and playback still works fine locally and remotely.

I really wish I could get to the bottom of this, but unfortunately I don't see myself getting any help from Plex devs.

hasse69 commented 2 years ago

I understand. Not really sure how to deal with issue to be honest. If you can confirm data is accessible across the same mount as PLEX is using then it is either about some permission or in what way data is accessed. Can you tell if it is only compressed archives for which this happens or also uncompressed (store mode, -m0)?

zeeter82 commented 2 years ago

Update:

So I built a brand new VM based on Ubuntu Server 21.10 and installed it on a different host running Hyper-V (I've been using VMWare Workstation in the past on the same host running PMS). The first media scan ran fine with no issues and no "too many open files" errors. I was hoping that maybe this fixed it, but the second scan which ran a couple hours ago had errors again with "too many open files". I have my PMS scans set to every 12 hours.

So if there are any other configs I can change/verify for files or limits maybe this is the next step. Again I don't know if this is truly a rar2fs issue.

hasse69 commented 2 years ago

Are you still getting a lot of those "close unmatched open" messages in syslog? From what I understand that is pretty bad and most likely results in file descriptor leaks which of course eventually will result in too many open files as well. I tried to google the message and it seems to appear in a few threads here and there but without any conclusions really that could help us understand more about the root cause.

zeeter82 commented 2 years ago

So with this new VM it does still produce the errors occasionally, but so far it's not as bad. Although this is still bad because even when it happens once, the scanner in PMS bombs as well and some of my media will be marked as falsely unavailable. I've also tried to increase my limits in my samba config and the smbd service, and other limits configs as well to "1048576", but of course something is still causing files to grow or not be released from the smbd process in a timely manner.

Latest syslog entries showing the issue on the last scan (only occurred one time):

Feb 19 02:30:56 plex-vm smbd[15081]: [2022/02/18 21:30:56.102026,  0] ../../source3/smbd/open.c:827(fd_open)
Feb 19 02:30:56 plex-vm smbd[15081]:   Too many open files, unable to open more!  smbd's max open files = 1048576
hasse69 commented 2 years ago

Any news here? I still do not have much to go on and I must admit I have some serious doubts about how this could be a rar2fs issue. But nevertheless you have an issue :( Is it possible for you to monitor the number of open smbd files somehow, before you get to the point that plex-vm throws an error? I was thinking if it would be worth a shot to try the RAMMap (Windows tool) and force a flush of the standby-list. https://docs.microsoft.com/en-us/sysinternals/downloads/rammap Windows has this "feature" to aggressively cache open files and flushing the standby-list would force them to become closed. Remember we had some issues with this when porting rar2fs to support WinFSP.

zeeter82 commented 2 years ago

Hey, not really anything new...I've been recovering from shoulder surgery which isn't very fun ;)

I've kinda just let it go, although I do still monitor it occasionally with some basic scripts. I'm still able to playback everything as well as the users who have access to my libraries. I guess the two biggest annoyances are:

  1. The random media items being marked with trash can (unavailable) which is obviously not true
  2. Plex kinda stalls/crashes about 1-2 times every 2 weeks (randomly) and I'm fairly certain the scanning is what is causing this. I just restart Plex and everything is fine again for a bit.

I found this command which gives me a list of processes with most open files and I've sorta used it while a scan is running, but not really had time for some serious diagnostics:

lsof | awk '{ print $2 " " $1; }' | sort -rn | uniq -c | sort -rn | head -20

And that's really about it...I'm kinda just letting it go for now. I did have an idea where I was going to try to migrate my Plex server over to Linux to see if this issue only exists on Windows PMS. I just have to find the time to actually try this because that's a pretty in depth process to have to rescan all my media back in etc.

zeeter82 commented 2 years ago

I just looked at RAMMap and emptied the standby list which was using like 14GB-18GB of memory....haha seems excessive. So I'll see if that makes a difference. I also setup a scheduled task which will empty the standby list every 20 minutes (I can adjust this schedule as necessary). If that's the culprit and this ends up solving my issue, I would say that's a decent band-aid. I know that's not a true fix though which I guess is just very poor memory management within the Windows OS.

I'll update you again when I have some news (hopefully good).

zeeter82 commented 2 years ago

UPDATE.....Good news!

It's been going 3+ days now with no issues from my Plex scans. The scheduled task running every 20min seems to have done the trick. I really wish Microsoft would fix this bug in Win10/Win11..... I'll wait for your reply @hasse69 but I think this issue can probably be closed now. Thanks again for your help with all this and for pointing me into the direction of "standby files".

hasse69 commented 2 years ago

Nice with some good news! However, I am pretty sure Microsoft would never acknowledge the standby list as a bug but a feature by design. I actually don't think the standby list is necessarily evil, but I think there might be some caveats related to its interworking with network file systems such as SMBD. If you are thinking about going "all-in" and report this to MS, why not? You decide yourself when to close this issue. But personally I am not sure there is much more we can do right now.

zeeter82 commented 2 years ago

Yea I wouldn't waste my time reporting this to Microsoft as what you said is most likely true anyways. I'll let others take it that far if they want to. I'm closing this issue again for now as this seems to be easily fixed with the RAMMap workaround.

Thanks again @hasse69