sahib / rmlint

Extremely fast tool to remove duplicates and other lint from your filesystem
http://rmlint.rtfd.org
GNU General Public License v3.0
1.91k stars 132 forks source link

--replay ignores files if the starting point has a hidden component #541

Open ghost opened 2 years ago

ghost commented 2 years ago

Hi, just discovered rmlint today and gave the GUI a try. It worked, but after a few hours of use (I was gradually going through a few dozen thousand collisions) the GUI started generating empty scripts for selections. At which point:

  1. I stopped trying to delete subsets of the data and instead tried to generate a script for everything, and it was also empty.
  2. I restarted the GUI and re-scanned the directory, got results as expected but scripts continued to be empty.
  3. I scanned from the commandline, got the correct output
  4. I uninstalled (pacman -Rdd) and reinstalled the package my distro offers for the GUI and re-tested, but got the same results.

Before filing this bug, I also tried a minimal repro using a fresh directory with 2 identical files placed in it. The GUI did successfully generate a script for this, but is still no longer able to generate scripts for the other folder.

cebtenzzre commented 2 years ago

Maybe something in the config could cause this. What is the output of dconf dump /org/gnome/Shredder/? Does dconf reset -f /org/gnome/Shredder/ fix the problem? Does "render script from all" also produce an empty script?

ghost commented 2 years ago

Given that its been 8 months and it took a couple hours to trigger, I doubt I can provide much for you anymore.

...is what I was going to say, but actually I managed to repro the issue on my first attempt. I've done some further digging and figured out what's causing this. The directory being scanned has a hidden parent in its path, and with the default settings Shredder passes --partial-hidden to rmlint when replaying. I'm not 100% on the exact mechanics of this, if I understand the short description it only counts hidden folders that are fully identical? Either way, turning that to handle all hidden files resolves the issue.

This is interesting, because it's definitely user error on my part but I'm not sure if there's some kind of issue here as well. In this case the 'hidden' part of the path was actually part of the source I passed to Shredder to scan, the files inside are 100% 'locally' visible. I can see some sense in still skipping them for safety reasons, but Shredder also displays these files regardless of the hidden file settings with (I think) no indication that they'll be missing from the cleanup.

I can open an issue for this UI quirk if you'd like, or if you feel it's as intended you can just close this and call it a day.

cebtenzzre commented 2 years ago

I can reproduce that with default CLI options:

$ mkdir .rmlint-test
$ echo 'foo' >.rmlint-test/a.txt
$ echo 'foo' >.rmlint-test/b.txt
$ rmlint -o json:rmlint.json .rmlint-test
$ rmlint -vv --replay rmlint.json .rmlint-test
INFO: Loading json-results `/tmp/rmlint.json'
Checking `/tmp/.rmlint-test/a.txt`: [nope: hidden]
[nope]
Checking `/tmp/.rmlint-test/b.txt`: [nope: hidden]
[nope]
==> 0 file(s) after investigation, nothing to search through.

I think replay mode is just checking whether there is a hidden directory in each file path, and not considering whether a hidden directory was given on the command line.

cebtenzzre commented 2 years ago

On develop, this is fixed in the new replay implementation (3f75ae609398de6ac27be24fc092a40241e1bd5c). On master, this is what you need to make it work:

diff --git a/lib/replay.c b/lib/replay.c
index bb0f7103..db1fa015 100644
--- a/lib/replay.c
+++ b/lib/replay.c
@@ -451,14 +451,24 @@ static bool rm_parrot_check_size(RmCfg *cfg, RmFile *file) {
     return false;
 }

-static bool rm_parrot_check_hidden(RmCfg *cfg, _UNUSED RmFile *file,
+static bool rm_parrot_check_hidden(RmCfg *cfg, RmFile *file,
                                    const char *file_path) {
     if(cfg->ignore_hidden == false && cfg->partial_hidden == false) {
         // no need to check.
         return true;
     }

-    if(rm_util_path_is_hidden(file_path)) {
+    // exclude the starting point from the hidden check
+    char *path_tmp = g_strdup(file_path);
+    char *path_end = path_tmp + strlen(file_path);
+    for(gint16 i = 0; i < file->depth; ++i) {
+        path_end = strrchr(path_tmp, G_DIR_SEPARATOR);
+        *path_end = '\0';
+    }
+    const char *inner_path = file_path + (path_end - path_tmp) + 1;
+    g_free(path_tmp);
+
+    if(rm_util_path_is_hidden(inner_path)) {
         FAIL_MSG("nope: hidden");
         return false;
     }