hasse69 / rar2fs

FUSE file system for reading RAR archives
https://hasse69.github.io/rar2fs/
GNU General Public License v3.0
279 stars 27 forks source link

rar subdir content not showing up if subdir is also a local directory #114

Closed m0vie closed 5 years ago

m0vie commented 5 years ago

Consider the following structure

root/dir1/fileLocal.txt
root/dir2/fileLocal.txt
root/fileLocal.txt
root/archive.rar:dir1/fileRared.txt
root/archive.rar:dir3/fileRared.txt
root/archive.rar:fileRared.txt

After rar2fs root/ mount/, the following is observed when listing directories in mount/:

mount/dir1/fileLocal.txt
mount/dir2/fileLocal.txt
mount/dir3/fileRared.txt
mount/fileLocal.txt
mount/fileRared.txt

Note how mount/dir1/fileRared.txt is missing in the directory listing.

However the file can be accessed directly, e.g. using cat mount/dir1/fileRared.txt.

Interestingly, for the root directory, both fileLocal.txt and fileRared.txt show up.

hasse69 commented 5 years ago

Thanks for the issue report. Currently this is how rar2fs works. You obviously cannot have the same name provided by two different sources. But to not complicate things too much the choice right now is that first level of collision is enough to trigger the filter. That means it does not matter if local path is dir/foo and path in archive is dir/fee, the collision is on dir level and thus nothing from dir in archive will show. The reason why dir/fee can still be accessed is also due to simplification. The file/dir does exist in the cache and basically it was just easier not to try to remove it if the duplicate filter would trigger. I do not find this to be a very serious problem/limitation, thus I will tag this as an enhancement with medium priority.

hasse69 commented 5 years ago

I tried to reproduce your specific issue and expected to see the same result as you did. Unfortunately, I saw something much much worse. I do not see this:

mount/dir3/fileRared.txt

I get an empty directory, which obviously is wrong! So there is a bug sneaking around here, not really related to your initial report.

hasse69 commented 5 years ago

My bad. I accidentally had a local and empty directory called dir3 in the root. That would hide the one coming from the archive. False alarm that is!

m0vie commented 5 years ago

I played around with this some more. I found a way to (temporarily) merge both caches and have them show up:

Start with

root/dir1_TEMP/fileLocal.txt
root/archive.rar:dir1/fileRared.txt

Mount and directory list /mount, /mount/dir1_TEMP, and /mount/dir1 to build up all caches.

Now, rename root/dir1_TEMP to root/dir1.

Refresh /mount/.

Now, both

mount/dir1/fileLocal.txt
mount/dir1/fileRared.txt

show up.

(Until caches are in validated, via USR1 or by making changes through /mount).

hasse69 commented 5 years ago

Yes, what you just did was in fact exploiting a bug in rar2fs. We should invalidate the cache when a local file/dir is renamed. I think we might be missing that. Note that local files are never placed in the rar2fs cache, only archive files are.

EDIT: Actually it is not a bug, rather a known limitation and nothing is really missing either. The problem is that you are changing the original/local file system after mount. There is currently no way for rar2fs to detect that. Operations towards the local file system is not triggering the FUSE kernel module and thus no callback is made to the user file system. The only way to solve that would be to put some watch on the local file system which for several reasons is not recommended to be done from within rar2fs. If this is a use-case of real importance to you, I think your best bet is to place some inotify monitor(s) on your local file system and trigger an invalidation of the rar2fs cache when they trigger.

hasse69 commented 5 years ago

Please try this patch on master/HEAD

readdir.patch.txt

It should allow you to merge entries from local file system with the ones coming from the archive, still with some of the previous mentioned limitations.

m0vie commented 5 years ago

Hmm, that patch only makes things after the initial rename trick a bit more stable (even after local modifications both local and rared content still shows).

My goal is actually to get the local+rar content to always show up merged (even read-only would be fine).

This is what I came up with:

mergecontent.patch.txt

I am not familiar with the code-base, but my idea is to simply skip the rar cache if the directory exists locally. This probably has some performance impact but this so far works for my requirement.

hasse69 commented 5 years ago

Hmm, that patch only makes things after the initial rename trick a bit more stable (even after local modifications both local and rared content still shows).

Not sure what ypu mean here? This patch merges directory entries from both local and RAR archive so it works according to your initial post? No "tricks" are needed?

The directory cache has nothing to do with RAR or not. The file cache is only used for RAR files but the directory cache is common for all files.

m0vie commented 5 years ago

Not sure what ypu mean here? This patch merges directory entries from both local and RAR archive so it works according to your initial post? No "tricks" are needed?

Didn't work for me. Directly after mounting only the local content showed up.

hasse69 commented 5 years ago

I have looked at your patch now and I think what it does sort of defeats the entire purpose of having the cache in the first place. I am not really sure I understand your use-case, but with the patch I provided you can still modify the local file system through your mount point as long is it does not try to modify anything that is in the RAR file cache, i.e. RAR contents. Such changes should be spotted directly. Changing stuff behind the back of rar2fs (or the mount point to be more precise) is not something the file system is designed for. The use-case simply is too thin. What we can do is add an option to by-pass the cache in readdir using a switch or something and do something similar to what you did. The decision is thus moved to the user do decide if it motivates such behavior or not due to the performance penalty.

hasse69 commented 5 years ago

Didn't work for me. Directly after mounting only the local content showed up.

Ok, that was interesting because that is not what I saw. Maybe you can provide an example archive and a local directory structure so that I can try for myself?

This is my file structure which I tested on (source directory)

$ ls -lR
.:
total 16
-rw-rw-r-- 1 hasse hasse  231 Oct  7 21:00 archive.rar
drwxrwxr-x 2 hasse hasse 4096 Oct  9 20:18 dir1
drwxrwxr-x 2 hasse hasse 4096 Oct  8 20:44 dir2
-rw-rw-r-- 1 hasse hasse   19 Oct  7 20:49 filelocal.txt

./dir1:
total 4
-rw-rw-r-- 1 hasse hasse 19 Oct  7 20:49 filelocal.txt

./dir2:
total 4
-rw-rw-r-- 1 hasse hasse 19 Oct  7 20:49 filelocal.txt

And archive.rar contains

UNRAR 5.00 beta 7 freeware      Copyright (c) 1993-2013 Alexander Roshal

Testing archive archive.rar

Testing     dir1/filerared.txt                                        OK
Testing     dir3/filerared.txt                                        OK
Testing     filerared.txt                                             OK
All OK

And this is what I see after mounting in the mount point

$ ls -lR
.:
total 24
drwxrwxr-x 2 hasse hasse 4096 Oct  9 20:02 dir1
drwxrwxr-x 2 hasse hasse 4096 Oct  8 20:44 dir2
drwxrwxr-x 2 hasse hasse 4096 Oct  7 20:50 dir3
-rw-rw-r-- 1 hasse hasse   19 Oct  7 20:49 filelocal.txt
-rw-rw-r-- 1 hasse hasse   19 Oct  7 20:51 filerared.txt

./dir1:
total 8
-rw-rw-r-- 1 hasse hasse 19 Oct  7 20:49 filelocal.txt
-rw-rw-r-- 1 hasse hasse 19 Oct  7 20:50 filerared.txt

./dir2:
total 4
-rw-rw-r-- 1 hasse hasse 19 Oct  7 20:49 filelocal.txt

./dir3:
total 4
-rw-rw-r-- 1 hasse hasse 19 Oct  7 20:50 filerared.txt

Then entering dir1 in the mount point

dir1 $ touch foobar
dir1 $ ls
filelocal.txt  filerared.txt  foobar

And even adding something directly to the source dir works

<move to source dir1>
dir1 $ touch feebar
<move to mount point dir1>
dir1 $ ls
feebar  filelocal.txt  filerared.txt  foobar

Is this not in fact covering your use-case? At least that is how I understood it.

hasse69 commented 5 years ago

Since my memory failed me a bit on the implementation I think we need to recap on how it actually works. The directory cache is basically a cache of folder contents, but only contents provided by RAR archives. Local files are basically never cached this way, but the directories as such are. That means you can add local files to a directory (source or mount point) and that will propagate automatically to the mount point. RAR files on the other hand will not. That is where the cache is a limiting factor. But, to overcome this you can add your RAR files to the mount point instead (provided it was mounted read-write). By doing so the contents of the RAR files will still propagate since any changes to a directory in the cache will invalidate that entry by design. It also works for multi-part archives but during transit of the file set and before the archive is complete the actual names of files will display since they are temporarily treated as local files.

m0vie commented 5 years ago

I see the problem/difference now.

Starting with the exact same setup:

~/rartest$ ls -lR source
source:
total 12,288
-rw-rw-r-- 1 m0vie m0vie   260 Oct  9 21:40 archive.rar
drwxrwxr-x 2 m0vie m0vie 4,096 Oct  9 21:40 dir1
drwxrwxr-x 2 m0vie m0vie 4,096 Oct  9 21:39 dir2
-rw-rw-r-- 1 m0vie m0vie     0 Oct  9 21:39 filelocal.txt

source/dir1:
total 0
-rw-rw-r-- 1 m0vie m0vie 0 Oct  9 21:39 filelocal.txt

source/dir2:
total 0
-rw-rw-r-- 1 m0vie m0vie 0 Oct  9 21:39 filelocal.txt
~/rartest$ rar t source/archive.rar

RAR 5.50   Copyright (c) 1993-2017 Alexander Roshal   11 Aug 2017
Trial version             Type 'rar -?' for help

Testing archive source/archive.rar

Testing     dir1/filerared.txt                                        OK
Testing     dir3/filerared.txt                                        OK
Testing     filerared.txt                                             OK
Testing     dir1                                                      OK
Testing     dir3                                                      OK
All OK

Now, what you did:

~/rartest$ rar2fs source/ mount/
~/rartest$ ls -lR mount
mount:
total 20,480
drwxrwxr-x 2 m0vie m0vie 4,096 Oct  9 21:40 dir1
drwxrwxr-x 2 m0vie m0vie 4,096 Oct  9 21:39 dir2
drwxrwxr-x 2 m0vie m0vie 4,096 Oct  9 21:39 dir3
-rw-rw-r-- 1 m0vie m0vie     0 Oct  9 21:39 filelocal.txt
-rw-rw-r-- 1 m0vie m0vie     0 Oct  9 21:39 filerared.txt

mount/dir1:
total 4,096
-rw-rw-r-- 1 m0vie m0vie 0 Oct  9 21:39 filelocal.txt
-rw-rw-r-- 1 m0vie m0vie 0 Oct  9 21:39 filerared.txt

mount/dir2:
total 0
-rw-rw-r-- 1 m0vie m0vie 0 Oct  9 21:39 filelocal.txt

mount/dir3:
total 4,096
-rw-rw-r-- 1 m0vie m0vie 0 Oct  9 21:39 filerared.txt

All good! However, if I do this instead:

~/rartest$ rar2fs source/ mount
~/rartest$ cd mount
~/rartest/mount$ ls -l
total 20,480
drwxrwxr-x 2 m0vie m0vie 4,096 Oct  9 21:40 dir1
drwxrwxr-x 2 m0vie m0vie 4,096 Oct  9 21:39 dir2
drwxrwxr-x 2 m0vie m0vie 4,096 Oct  9 21:39 dir3
-rw-rw-r-- 1 m0vie m0vie     0 Oct  9 21:39 filelocal.txt
-rw-rw-r-- 1 m0vie m0vie     0 Oct  9 21:39 filerared.txt
~/rartest/mount$ cd dir1
~/rartest/mount/dir1$ ls -l
total 0
-rw-rw-r-- 1 m0vie m0vie 0 Oct  9 21:39 filelocal.txt

So cd-ing to the dir first somehow makes a difference.

hasse69 commented 5 years ago

This honestly makes no sense :( It should not matter if you cd like this or not. And also I could not reproduce that behavior either. Can you please attach your archive here? I see a slight difference in your archive compared to mine since in yours the directories seems to have been given distinct entries. Maybe that is affecting something here, what it is I need to investigate if that is the case.

m0vie commented 5 years ago

Sure.

(This is just renamed to .zip as github won't allow rar files)

archive.rar.zip

hasse69 commented 5 years ago

Ok, as I suspected, that had no effect

$ cd d
d $ ls -l
total 28
-rw-rw-r-- 1 hasse hasse  231 Oct  7 21:00 archive.rarx
drwxrwxr-x 2 hasse hasse 4096 Oct  9 22:01 dir1
drwxrwxr-x 2 hasse hasse 4096 Oct  8 20:44 dir2
drwxrwxr-x 2 hasse hasse 4096 Oct  9 21:39 dir3
-rw-rw-r-- 1 hasse hasse   19 Oct  7 20:49 filelocal.txt
-rw-rw-r-- 1 hasse hasse    0 Oct  9 21:39 filerared.txt
d $ cd dir1
d/dir1 $ ls -l
total 8
-rw-rw-r-- 1 hasse hasse 19 Oct  7 20:49 filelocal.txt
-rw-rw-r-- 1 hasse hasse  0 Oct  9 21:39 filerared.txt

I honestly cannot understand why it does not work the same for you? Because it should.

m0vie commented 5 years ago

Hmm, I have no idea. I get this exact same behavior on both Ubuntu 19.04 and also on Cygwin using WinFSP.

Maybe the libunrar version has something to do with it?

hasse69 commented 5 years ago

Ok, let me try on Cygwin too. And what version of libunrar are you using?

m0vie commented 5 years ago

On Cygwin it was unrarsrc-5.6.5.tar

On Ubuntu I freshly recompiled everything (including your patch) with unrarsrc-5.8.2.tar.gz.

hasse69 commented 5 years ago

Ok, I tried it on Cygwin, and yes, there is a problem somewhere. But for me filerared.txt does not show up in dir1 whatever I try. So for me doing cd or not makes no difference. Which actually does more sense than that it should be different behavior. Must dig into why it does not work properly on Cygwin.

EDIT: Nah, both archives behave the same, It works if i first do ls on the root folder and then ls dir. I did notice that on Cygwin nothing happens when sitting inside the mount point whereas on my Linux box a lot of activity is triggered in the background. Something does not work when the root folder is not in the cache before listing dir1. Need to figure out what that is.

hasse69 commented 5 years ago

readdirv2.patch.txt

Try this one.

m0vie commented 5 years ago

I still see some weird behavior. It still matters which directory is accessed first.

Directly after mounting, it's likely the host OS accessing the mount point in some way, making it hard to reproduce.

But with USR1, I see this consistently:

Cygwin:

$ rar2fs source/ x:

$ killall -USR1 rar2fs; ls /cygdrive/x/dir1
filelocal.txt

$ killall -USR1 rar2fs; ls /cygdrive/x/ > /dev/null; ls /cygdrive/x/dir1
filelocal.txt  filerared.txt

Ubuntu:

$ rar2fs source/ mount/

$ killall -USR1 rar2fs; ls mount/dir1
filelocal.txt

$ killall -USR1 rar2fs; ls mount/ > /dev/null; ls mount/dir1
filelocal.txt  filerared.txt
m0vie commented 5 years ago

debug_root_then_dir1.txt debug_dir1.txt

hasse69 commented 5 years ago

Yes, that is because my patch is limited. Try this one instead,

readdirv3.patch.txt

m0vie commented 5 years ago

That did the trick :)

$ killall -USR1 rar2fs; ls /cygdrive/x/ >/dev/null; ls /cygdrive/x/dir1
filelocal.txt  filerared.txt

$ killall -USR1 rar2fs; ls /cygdrive/x/dir1
filelocal.txt  filerared.txt

Thanks a lot for looking into this, really appreciate it!

hasse69 commented 5 years ago

Ok, great! Thanks a lot for testing :) I will go through the patch once more and do some regression on it before I merge it to master. I will also re-classify this as a bug since after digging more into this I believe that is in fact what it is.

hasse69 commented 5 years ago

I will reopen this because the solution is not fully working. There is still a case that does not work. The patch only solves it partly for things like ls, but direct access to the file still does not work, e.g.

$ cat d/dir1/filerared.txt
cat: d/dir1/filerared.txt: No such file or directory

(if d is the mount point) I had a quick look at it and this problem is not as easy to solve as I initially thought.

hasse69 commented 5 years ago

Please try the latest version on master to confirm everything is still working. I will close this again once it has been verified.

m0vie commented 5 years ago

I tested using the latest master and everything works as expected for me. 👍

hasse69 commented 5 years ago

Great! My regression seems to pass too. Closing.