sahib / rmlint

Extremely fast tool to remove duplicates and other lint from your filesystem
http://rmlint.rtfd.org
GNU General Public License v3.0
1.86k stars 128 forks source link

check_for_equality uses numeric vs. string compare #599

Open rnsc5jjjjj opened 1 year ago

rnsc5jjjjj commented 1 year ago

If the paranoid mode is selected (-p) original_check calls check_for_equality which runs cmp if they are simple files and rmlint if they are directorys. Both seem to work correctly, but on returning to original check, original check performs a numeric comparison of the two strings: The quoted return value from check_for_equality and a quoted "0":

if [ "$check_for_equality "$1" "$2")" -ne "0" ] ; then

This should use a string compare not equal:

if [ "$check_for_equality "$1" "$2")" != "0" ] ; then

Easy enough to fix. I put "set -xv" and set "xv" around the section) to watch what it did. It ran everything correctly, coming down to the following test:

[ "0" != "0" ]

however it finds that this expression is true, and executes the "then" clause which prints a message that the files are no longer identical, cancelling the deletion and returning 1 to the caller, where the deletion is not performed.

It looks like everything works correctly except for declaring "0" != "0" is true!

I cannot figure out what is wrong. The round circles are zeros, not ohs.

Any suggestions would be very welcome.

cebtenzzre commented 1 year ago

Numeric comparison isn't a problem here. [ "0" -ne "0" ] and [ "0" != "0" ] both fail in any bash-like shell, as they should. The result of check_for_equality is used in a roundabout way that is cleaned up on the develop branch, but it's still correct in master.

Could you find a simple example that causes the problem you are seeing (ideally run rmlint on only the files that the message is complaining about), and provide the resulting script and the complete output of sh -x rmlint.sh? I find it more likely that you are somehow missing cmp (part of diffutils), seeing a problem with rmlint -p --equal for directories, or actually running into a file that changed, than there being a logic problem in the script, as it seems to work for almost everyone else.

rnsc5jjjjj commented 1 year ago

Below are two tgz files demonstrating the issue. There is a ReadMe file outlining what I did, and a typescript log file output by the script command as I did it. These demonstrate the behavior of the unmodified rmlint 2.10.1. You will see the shell complaining that the arguments for -ne are expected to be integers. I did not run it with my substituting != for -ne, but in this case the shell will find that 0 != 0 is true.

There are two tar files because I discovered that the generated rmlint .sh file specifies /bin/sh for an interpreter (which is the right thing to do) and that debian links /bin/sh to /bin/dash. Essentially this happened in 2009, as explained here: https://askubuntu.com/questions/976485/what-is-the-point-of-sh-being-linked-to-dash. I was skeptical, but the reference explains that dash is at least as good if not better adherence to POSIX, which was the goal, and it is alot faster. The second tar archive modifies the generated script to specify the /bin/bash interpreter, which seems to be pretty ubiquitous except for the debian boot shell of dash. The results are the same, except for a second error message that is a little more readable.

All software has bugs, but what version of rmlint would you be using if you were in a corner and had to do a series of trimings without error to pull the size down to where you can back it up? I am shooting in the dark. While you cannot make any promises, your guess is much better than my shots in the dark! Thanks.

And thanks for what you do. The few troubles I have had not withstanding, this is a great package, incredibly clearly architected and written, and great documentation.

rnsc5jjjjj commented 1 year ago

Blast! I forgot the tgz files! Here they are: Well, it does not take tgz files, or TGZ files. It took them as .log files. Please rename them as .tgz after downloading. demo_from_scratch.log test_with_bash.log

rnsc5jjjjj commented 1 year ago

I fixed the bug. I am attaching:

fix_-xv_SlowSteps2.2.log (must be renamed .tgz) which demonstrates the fix exactly as the previous test_with_bash.log demonstrated the bug with bash for the rmlint.sh shell. I have also tested it with /bin/sh which is linked to /bin/dash on debian and ubuntu systems.

fix_-xv_SlowSteps2.2.log

Solution.png which is a screenshot of a WinMerge showing the three small changes made.

Solution

I changed check_for_equality from an echo of the cmp or rmlint return value to stdout to a "return" of the cmp or rmlint return value. I don't see the value in converting it into a string to flow through stdout. This makes check_for_equality have the same semantics as any UNIX command: 0 means success, anything else failure, with the result in $?. Also as I expect you know, the original code redirected all output to stdout from the statements grouped by check_for_equality's { }, which in the case of a directory comparison included all stdout from rmlint. This is why it did not match a simple "0". The simple "0" was the last character, but there was a ton of other stuff in that string also!

I changed original_check to run check_for_equality prior to the conditional that will check its return value, then had the condition reference $? to expose the return value to compare to a zero. I left in one "set -xv" statement and two "set xv" statements so you can see it happen.

I am assuming that I can simply edit lib/sh.sh with these changes and recompile. Is there anything else I should know?

Thanks.

cebtenzzre commented 1 year ago

Yeah, you should be able to edit lib/sh.sh and recompile. That change is already included in the develop branch. If I change anything on master I will probably just redirect the output of rmlint --equal to /dev/null.