sahib / rmlint

Extremely fast tool to remove duplicates and other lint from your filesystem
http://rmlint.rtfd.org
GNU General Public License v3.0
1.86k stars 128 forks source link

csv formatter: free of invalid pointer caused by incorrect use of g_slice_free1 #496

Closed cebtenzzre closed 3 years ago

cebtenzzre commented 3 years ago

Steps to reproduce

$ mkdir repr
$ echo a >repr/a.txt
$ echo bb >repr/b.txt
$ rmlint -T df -o csv:stdout -c csv:unique repr

Actual behavior

rmlint crashes with this output:

type,path,size,checksum
unique_file,"/tmp/repr/b.txt",3,
free(): invalid pointer
ERROR: Aborting due to a fatal error. (signal received: Aborted)
ERROR: Please file a bug report (See rmlint -h)

Or, with AddressSanitizer:

type,path,size,checksum
unique_file,"/tmp/repr/b.txt",3,
=================================================================
==1462093==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed: 0x6150000015c0 in thread T3 (pool-rmlint)
    #0 0x7f7108a7d0e9 in __interceptor_free /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:123
    #1 0x557f73dfeccf in rm_fmt_elem lib/formats/csv.c:102
    #2 0x557f73dcb657 in rm_fmt_write_impl lib/formats.c:348
    #3 0x557f73dcdf9d in rm_fmt_write lib/formats.c:469
    #4 0x557f73de883e in rm_shred_forward_to_output lib/shredder.c:1308
    #5 0x557f73de8f08 in rm_shred_group_postprocess lib/shredder.c:1495
    #6 0x557f73de94ae in rm_shred_result_factory lib/shredder.c:1553
    #7 0x7f7108687c86  (/usr/lib/libglib-2.0.so.0+0x84c86)
    #8 0x7f71086850c0  (/usr/lib/libglib-2.0.so.0+0x820c0)
    #9 0x7f710848a298 in start_thread (/usr/lib/libpthread.so.0+0x9298)
    #10 0x7f71083b3052 in __GI___clone (/usr/lib/libc.so.6+0xff052)

0x6150000015c0 is located 192 bytes inside of 240-byte region [0x615000001500,0x6150000015f0)
allocated by thread T0 here:
    #0 0x7f7108a7e1c6 in __interceptor_posix_memalign /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:226
    #1 0x7f7108675499  (/usr/lib/libglib-2.0.so.0+0x72499)

Thread T3 (pool-rmlint) created by T0 here:
    #0 0x7f7108a231c7 in __interceptor_pthread_create /build/gcc/src/gcc/libsanitizer/asan/asan_interceptors.cpp:214
    #1 0x7f71086a9de4  (/usr/lib/libglib-2.0.so.0+0xa6de4)

SUMMARY: AddressSanitizer: bad-free /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:123 in __interceptor_free
==1462093==ABORTING

Expected behavior

rmlint prints this output:

type,path,size,checksum
unique_file,"/tmp/repr/b.txt",3,
unique_file,"/tmp/repr/a.txt",2,

Analysis

Here, checksum_str and checksum_size are initialized. https://github.com/sahib/rmlint/blob/427791ca7c9ffa66b2c666b6b01874c5e165fae6/lib/formats/csv.c#L73-L74 Here, if file->digest is NULL indicating a unique file, a block of size one is allocated with g_slice_alloc0 and stored in checksum_str, without changing checksum_size. This behavior was introduced by commit 1b06eb3d0787b42a3159573ee970a5a885a3ccae. https://github.com/sahib/rmlint/blob/427791ca7c9ffa66b2c666b6b01874c5e165fae6/lib/formats/csv.c#L81-L84 Here, g_slice_free1 is called. checksum_size is still zero from when it was initialized, even though checksum_str points to a block of size one. https://github.com/sahib/rmlint/blob/427791ca7c9ffa66b2c666b6b01874c5e165fae6/lib/formats/csv.c#L101-L103 From the glib docs, regarding g_slice_free1:

...the block_size has to match the size specified upon allocation.

SeeSpotRun commented 3 years ago

Thanks for reporting and particularly for the diagnostics.

Was fixed as of f4c821c in the develop branch and will be merged into master branch at next release. Meanwhile you can compile from develop branch.

cebtenzzre commented 3 years ago

Whoops, I saw recent commits in master and figured it wasn't too far from develop. Perhaps I should make an AUR package for that branch so I don't run into old bugs as often.

SeeSpotRun commented 3 years ago

Perhaps I should make an AUR package for that branch

You're welcome to, but really we should be more pro-active at back-porting clear-cut bugs so that it's not necessary. I still have my L-plates on as a package maintainer and @sahib has gotten busy with other projects and had to step back a bit.

I'm gearing up for a new point release soon, I feel that #492 is a good foundation for that.