sahib / rmlint

Extremely fast tool to remove duplicates and other lint from your filesystem
http://rmlint.rtfd.org
GNU General Public License v3.0
1.85k stars 128 forks source link

feature: allow specifiying parameters using (json?) input file #622

Open retorquere opened 1 year ago

retorquere commented 1 year ago

I need to scan a lot of individual files (all in different directories), and I might run out of the maximum command line length. Would it be possible to add the files/folders to scan in an input file to be read by rmlint?

cebtenzzre commented 1 year ago

I believe this is only documented by example in the man page (see "Do more complex traversal using find(1)"), but rmlint can take file/folder paths via standard input using -. For example, if you want to identify duplicates from a list of paths called 'long_list.txt':

rmlint -T df - <long_list.txt

This is especially convenient if the files can be located by find, as shown in the man page.

retorquere commented 1 year ago

Unfortunately in the environment I'm in, I can't pass data via pipes.

cebtenzzre commented 1 year ago

Could you provide a little more information about your use case? What operating system are you on, and how is rmlint being executed? I would assume that if you have an environment that can run rmlint, it also has some kind of shell, such that you can execute e.g. sh -c 'rmlint -T df - <long_list.txt'.

retorquere commented 1 year ago

I'll admit it's super niche, but I'm running it from a Zotero plugin, which is a stripped down and repurposed Firefox. Running executables is limited in what it can do, and any kind of redirection of stdin/stdout doesn't work. I'm using rmlint because it can dump to a json file which I can read back in.

cebtenzzre commented 1 year ago

I still don't see why you cannot execute /bin/sh to do the redirection for you. Even if shell quoting is an issue and/or you need to pass arbitrary arguments, you can do something like this:

/bin/sh -c 'f=$1; shift; rmlint "$@" <"$f"' sh long_list.txt -T df -

- is not really an option, it is a placeholder, as it matters which side of the // separator it is on. So if there were a --files-from option, it would imply a non-tagged - unless - is explicitly given, similar to how -0 works. But I think 99% of people are already calling rmlint from a shell or from a language like python that supports communicating with a process via stdin.

retorquere commented 1 year ago

Not all systems have /bin/sh, and I'd prefer it if I could just run a binary rather than a potential extra layer between my code and the executible as an extra potential source of errors - one major appeal of rmlint is that it's a single, cross-platform binary. I cannot capture error text either, so if something doesn't work, I'm in the dark on the reason, so the less potential reasons the better.

But as said, it's a super niche case, and I fully understand that it's not worth the trouble of adding extra code to maintain to rmlint to address it.