hasse69 / rar2fs

FUSE file system for reading RAR archives
https://hasse69.github.io/rar2fs/
GNU General Public License v3.0
272 stars 26 forks source link

Creating sub-processes using 100% CPU when content is added #165

Closed philnse closed 2 years ago

philnse commented 3 years ago

After hours of research regarding my problem I desperatly end up posting here as I just can't figure out how to solve the issue. I'm using rar2fs with an instance of plex. For the last ten days, rar2fs was running as expected: One instance per mount. After adding two packed episodes overnight I ended up with a 100% CPU load for the mount that is handling the series section. This is an issue I had several times in the past. It seems to be the same issue adressed in in #11

If I remember correctly the suspected reason for it was the plex media server wanting to add files to the mounted but still rar'd files. I figure it has something to do with chapter images that are generated by plex using the transcoder, as no files are saved in the original content folder but the plex media server's library I'm having a hard time to reproduce the error. Could it be that some kind of cache is piled with data when generating those chapter images and rar2fs crashes? I've attatched an excerpt of my webmin to clarify:

Screenshot

Please let me know if you need any logs. I'm happy to provide them.

//EDIT: I forgot to mention what I tried to solve it by now: run as root run as regular user --seek-length=0, --seek-length=1 --no-smp Killing exceeding/crashed instances of rar2fs resulting in loosing the mounted content

Regards

My system is running:

Ubuntu 18.04.5 rar2fs v1.29.5-gita393a68 (DLL version 8) Copyright (C) 2009 Hans Beckerus FUSE library version: 2.9.7 fusermount version: 2.9.7 UNRAR 6.02 beta 1 freeware Plex Media Server Version 1.23.4.4712

hasse69 commented 2 years ago

I am not sure what to make out of the last two comments. Since a long time back rar2fs does not support resolution of RAR files inside RAR files. To get that behavior you would need to stack mounts. Is that what is happening here?

Also I am not currently sure what would be wrong in rar2fs here since the busy loops occurs in gcc runtime and the C++ exception handling. I contacted rarlabs and presented the signature of the problem but it was nothing that had been observed before. So why this problem should just suddenly pop up seems a bit strange to say the least.

andree392 commented 2 years ago

i havn't really touched my mount config in a while, just a single mount command in fstab.. no stacking rar2fs#/mnt/smb /mnt/unrared fuse ro,allow_other,noatime,nonempty,umask=222,--seek-length=2 0 0

path is something like: /mnt/smb/Tv/show/season/episode/rarfiles

I'll try and send an example file to your email, not sure if it's 100% reproducible.

poltergaiist commented 2 years ago

I am not sure what to make out of the last two comments. Since a long time back rar2fs does not support resolution of RAR files inside RAR files. To get that behavior you would need to stack mounts. Is that what is happening here?

Also I am not currently sure what would be wrong in rar2fs here since the busy loops occurs in gcc runtime and the C++ exception handling. I contacted rarlabs and presented the signature of the problem but it was nothing that had been observed before. So why this problem should just suddenly pop up seems a bit strange to say the least.

The problem haven't resurfaced since I removed all SUB-folders with subtitle-rar-files. I am indeed stacking mounts, but the culprit seems to have been the subtitles.

It is not a rar-within-a-rar. It's Folder, rar-files, sub-folder, rar files (sub). I believe Kodi had the exact same issue using rar2fs.

hasse69 commented 2 years ago

Still cannot really understand what you mean by folder in this context? Folders are not part of extraction unless they reside inside an archive.

Please if you can share an example that fails I can try to reproduce. Possibly there is something that can be done to workaround this issue but I need a good reproducer to do so.

hasse69 commented 2 years ago

Tried the example archive that was sent to me. I am not able to push this anywhere close to throwing an exception like the stack trace is showing

#8  0x000055e3287f5205 in ErrorHandler::Throw(RAR_EXIT) ()
#9  0x000055e3287f4ac1 in ComprDataIO::UnpWrite(unsigned char*, unsigned long) ()
#10 0x000055e3287ffa44 in Unpack::UnpWriteData(unsigned char*, unsigned long) ()
#11 0x000055e32880012c in Unpack::UnpWriteBuf30() ()
#12 0x000055e3287fb0cf in CmdExtract::ExtractCurrentFile(Archive&, unsigned long, bool&) ()
#13 0x000055e32880eb3c in ProcessFile(void*, int, char*, char*, wchar_t*, wchar_t*) ()

In fact no exception is thrown at all so it seems something goes wrong at extraction and which is then detected by libunrar and it throws an exception that gets stuck in some libgcc dead-lock.

hasse69 commented 2 years ago

I wonder if this can be related to the dry-run made during extraction of compressed archives to detect bad passwords or CRC failures. If someone think they can easily reproduce this please try to remove the dry-run by changing function extract_rar() in rar2fs.c:

static int extract_rar(char *arch, const char *file, void *arg)
{
        if (!arg) return 0;   // add this line
...
}

It is not a solution but it would narrow it down a bit. I have forced exceptions to be thrown at each call and still cannot reproduce what some of you report.

philnse commented 2 years ago

I've added your code to rar2fs.c and recompiled using ./configure --enable-debug --disable-static-unrar && make

I'll let you know how this works out!

hasse69 commented 2 years ago

Another thing you can try is to use the -s mount flag. If this is related to issues with exception handling and multi-threaded applications that could also be a possible workaround. Using single threaded mode may slow down things a bit but my guess is that it should not be causing any major implications for your use-case.

hasse69 commented 2 years ago

I think there might be more to this than what first meets the eye.

I finally managed to push a mount to the point that I got hanging sub-processes (and sometime even crashes), not using 100% CPU though, but that can be due many different reasons. So in short, there seems to be some issues with rar2fs (and possibly also with fuse) and multi-threaded mode that I need to look into. In the meanwhile, using -s (single threaded mode) seems to avoid all of the problems I have encountered so far. So that would be my best bet to try first.

poltergaiist commented 2 years ago

@hasse69 I had ~5 mounted stacks, swapoed to 1 and haven't had the issue since. Did not see any info about this, thanks for the help! Much appreciated!

hasse69 commented 2 years ago

@tattarn can you elaborate what you mean by going from 5 "stacks" to 1? And what info is it that you are missing?

philnse commented 2 years ago

I think he had several dirs mounted seperately with content in each dir and know just mounts the dir in which the seperate dirs are placed instead.

After I included that fix you posted the mounted contents are 644 and cant be changed. (RARs are 755 respectivly). Recompiling again using -s now.

hasse69 commented 2 years ago

You mount using -s, it has nothing to do with how it is compiled.

philnse commented 2 years ago

But I had to get rid of the altered code in rar2fs because of permission issues. Meant to use - s when mounting, not compiling.

hasse69 commented 2 years ago

The one liner I proposed can impossibly affect your permissions. That must have been something else.

philnse commented 2 years ago

I think it might have happened because of manually copying and not adjusting permissions of the bin then... Maybe its working now while using -s

skkii commented 2 years ago

bump for running into the same problem just days ago. for now running single threaded mounts and going to report. thanks guys for all the effort you put into it.

hasse69 commented 2 years ago

@skkii thanks for the info. Were you able to also collect a backtrace of the process that got stuck? Would be valuable to see if signature is the same or not.

I have so far found one issue in rar2fs that I have fixed and verified but it is not related to the C++ exception getting stuck in libgcc. The latter is still a mystery. I have also noticed that fuse seems to lose a release/close call (at least I fail to trace it back to the fs) and also spotted assertions trigger in libfuse. The assertion seems to have been reported to fuse already and is thus a known issue. So there seems to be multiple issues related to multi-threaded mode, at least when pushing the system with 30-40 threads or more. Running just a few threads have so far only triggered the mentioned issue in rar2fs.

hasse69 commented 2 years ago

I managed to find yet one more issue in rar2fs that might be related. It was in fact not a missing release but rather open() that never returned causing the user process to get stuck. Is there anyone in this thread that would be able to test a patch if I provide such?

philnse commented 2 years ago

Good news. Sure, the least I can do using this free software!

hasse69 commented 2 years ago

Thanks. Apply the below patch on latest master, v1.29.5.

issue165.patch.txt

Please report back and make sure NOT to run using -s this time. This is not an official patch so it should only be used to confirm if it resolves the issue(s) or not.

philnse commented 2 years ago

Applied the patch as you described here: Originally posted by @hasse69 in https://github.com/hasse69/rar2fs/issues/85#issuecomment-354458705

I mounted with --seek-length=1 -o allow_other and without -s Fingers crossed.

Yes, it can be fixed. There are many systems out there, impossible to cover them all by myself. This is why I rely on feedback and reports if something is not working. Good catch. Try the attached patch. You apply it by copying the file to the root of the rar2fs source folder and do:

patch -p1 < scandir.patch.txt

You should see output like:

patching file configure.ac
patching file m4/ax_prototype_scandir.m4
patching file rar2fs.c

After patch has been applied, re-run the configuration step:

autoreconf -fi
./configure

This time is should hopefully pass without errors and you can try:

make
make install

But I am not sure this is the one and only problem we will observe on this system. But lets hope for the best and take it step by step.

philnse commented 2 years ago

I've added two libraries as usual. So far no problems. I have ten mounts with CPU load of 0-2% and without any sub-processes. I'll do another run adding libraries in plex for testing.

poltergaiist commented 2 years ago

How's it going @philnse?

philnse commented 2 years ago

Works like a charm. I really recommend applying the patch!

hasse69 commented 2 years ago

The more users that can try the patch the better of course. Especially since I am not able to reproduce the exact same problem as reported. Currently working on the open call that seems to get lost in translation somewhere between kernel/libfuse and rar2fs. Possibly there might be a workaround for that issue which I am looking into.

poltergaiist commented 2 years ago

The more users that can try the patch the better of course. Especially since I am not able to reproduce the exact same problem as reported. Currently working on the open call that seems to get lost in translation somewhere between kernel/libfuse and rar2fs. Possibly there might be a workaround for that issue which I am looking into.

I followed @philnse quote on patch-install. It still says rar2fs v1.29.5-gita393a68 (DLL version 7) for me as before, but that maybe doesn't change?

/Best regards

hasse69 commented 2 years ago

Applying a stand-alone patch does not affect the version string. Unless the patch change it of course. But that is not the case here.

poltergaiist commented 2 years ago

Hi, been running for 3 days without any issue. /Best regards

hasse69 commented 2 years ago

Hi, been running for 3 days without any issue.

Thanks. Since we now have several users seemingly reported the same issue with almost identical signature I guess the patch is working or at least improves the situation. I will clean it up and merge it when time allows.

I have been chasing what I think is another problem (since I still have not been able to reproduce the original issue) that I discovered while looking into this issue. I see no indications however that this new problem I found is in rar2fs and it looks to me like the root cause is somewhere in the FUSE layer. I have not been able to work around it either. But since no one have reported this problem before it could also be something unique to my system. How this problem manifest itself is either by FUSE not serving a request to open() or that release call-back is sent twice for the same handle resulting in double-free (SIGSEGV) or assertion (SIGABRT) crashes. Not sure for how much longer I can spend time on it.

hasse69 commented 2 years ago

Long story short, this required more effort than I expected and regression has been a pain. But I now have something that should be ready for merge but I would really like someone in this thread to test attached patch before I can even consider that.

issue165v2.patch.txt

Note that this patch is put on top of master/HEAD which has moved slightly since the release of v1.29.5. Thus, you would need to pull in the latest version on GitHub before applying the patch or it will most likely not apply nicely.

skkii commented 2 years ago

I have been running issue165.patch for several days now. Even with relatively high usage and double-digit number of mounts - without problems. However, I will import the v2 and report back should there be problems or crashes. Thanks for taking care of this! 💪🏻

hasse69 commented 2 years ago

issue165v3.patch.txt

Rebase on top of master/HEAD

poltergaiist commented 2 years ago

issue165v3.patch.txt

Rebase on top of master/HEAD

Hi, Does the 1st patch need to be reveresed & then run the issue165v3.patch.txt? (New to applying patches etc, how to reverse?) /Best regards

hasse69 commented 2 years ago

@tattarn that is correct, you need to stash (git stash) your changes and pull in latest on master (git pull) and then apply v3.

Note that if you do not wish to mess with git, simply wipe your rar2fs source/repo directory and download a new copy representing master/HEAD here https://github.com/hasse69/rar2fs/archive/refs/heads/master.zip, then apply v3.

hasse69 commented 2 years ago

I really appreciate all the efforts made here to test the patch I have posted. The patch is ready for merge, all I need to do is to "push the button" as soon as I get only positive feedback.

poltergaiist commented 2 years ago

I really appreciate all the efforts made here to test the patch I have posted. The patch is ready for merge, all I need to do is to "push the button" as soon as I get only positive feedback.

Hi! I'll get back to you in the end of the week, took me some time to understand how to do the above but I'm pretty sure I'm running v3 now,

philnse commented 2 years ago

Just stopped by to let you know that I'm on the most recent patch as well now! I'll let you know if I notice anything.

skkii commented 2 years ago

issue165v3 stable and doing fine for the last 10 days under high usage.

hasse69 commented 2 years ago

I think we have enough reports now of this latest patch working as expected, or at least no reports stating the opposite. I will look through it once more and then I will push it to master.

milesbenson commented 2 years ago

I will try this patch as well, i ran into similar issues that 1 core had 100% usage from time to time but i did get why yet. In addition every 2-3 months all cores went to 100%. Is there a way to enforce this behaviour? I dont use chapter images at all.

Its a very big library with around 7k movies and 50.000 episodes and heavy usage.

poltergaiist commented 2 years ago

Hi! No problems with rar2fs since patch :)

hasse69 commented 2 years ago

@milesbenson Thanks for you willingness to try out this patch, appreciated. Unfortunately there is no deterministic way to trigger this and to confirm its working. But you still have the chance to try it out since it has just now been merged to master :)

I will leave this issue open for a while longer before closing.

milesbenson commented 2 years ago

I applied the patch yesterday and rebootet/remounted everything as usual (rar2fs -o rw,allow_other,uid=1001,gid=100,warmup --seek-length=1). I am using rclone google drive for most of my stuff. Now refreshing library within Plex is taking ages to complete as scanning is very low. ls -R on the rar2fs mount is pretty fast. Might this be related to the patch? 12h scan vs 20-30min before when scanning 6k movies. Stuff not on drive is ok.

milesbenson commented 2 years ago

Reverted the patch, smooth scanning again. With patch I had a overall ram usage of less then 2gb, without patch its 7gb like its been before - I guess that means the stuff got cached?

hasse69 commented 2 years ago

No, I did not expect the patch to have any effect on things like duration of library scanning. I guess you need to file a new issue report since the patch is now merged.

hasse69 commented 2 years ago

@milesbenson what you can try quickly is if the patch below applied to master/HEAD will make a difference or not. It is the only change made to caching policy really. But I suspect this is something else.

patch.txt

milesbenson commented 2 years ago

Applied the patch, but same slow scanning behavior on Plex together with gdrive/rclone. Local disks all fine

milesbenson commented 2 years ago

Additional information:

Without patch v3 which is in master now, behaviour was like this:

With patch its like 100% cpu is kind of supressed/avoided but therefore very slow

Maybe that helps.

hasse69 commented 2 years ago

I guess all we can do is to revert the patch piece by piece to see what is the culprit. The intention of the patch was of course never to have such impact or side effects as you describe, but mistakes are done unfortunately. Pretty hard to test as well since focus was on eliminating the stuck processes and the library scan you are referring to is not something I have the possibility to test myself.