sahib / rmlint

Extremely fast tool to remove duplicates and other lint from your filesystem
http://rmlint.rtfd.org
GNU General Public License v3.0
1.91k stars 132 forks source link

Bad regex sort when multiple paths match #484

Closed BrknRobot closed 3 years ago

BrknRobot commented 3 years ago

Regex comparisons where both paths/basenames match incorrectly order the files, rather than allowing subsequent criteria to break the tie.

example where regex ordering breaks 1/a 1/a2 1/b 2/a 2/a2 2/b pattern: x<1>r<a>l

SeeSpotRun commented 3 years ago

Firstly -S x relates to basename while -S r relates to full path. So you probably want pattern r<1>x<a>l

But still there is a bug:

$ rmlint -S 'r<1>x<a>l'
# Duplicate(s):
    ls '/pwd/1/b'
    rm '/pwd/1/a'
    rm '/pwd/1/a2'
    rm '/pwd/2/a2'
    rm '/pwd/2/a'
    rm '/pwd/2/b'

Expected:

# Duplicate(s):
    ls '/pwd/1/a'
    rm '/pwd/1/a2'
    rm '/pwd/1/b'
    rm '/pwd/2/a'
    rm '/pwd/2/a2'
    rm '/pwd/2/b'

It looks like there is a problem with the regex match sorting implementation here: https://github.com/sahib/rmlint/blob/094fbd59cbfb2d2df73dacb0647aacc425982848/lib/preprocess.c#L356-L389

Will look into it....