markfasheh / duperemove

Tools for deduping file systems
GNU General Public License v2.0
689 stars 75 forks source link

fdupes dedupe triggering kernel memory leak and apparent endless loop #83

Closed juliantaylor closed 8 years ago

juliantaylor commented 9 years ago

using duperemove --fdupes on a 3.19 kernel (ubuntu 15.04 kernel) seems to go into an endless loop in duperemove issuing lots of IOC_FILE_EXTENT_SAME ioctls. These also seem to leak memory so after a short while the machine crashes.

the issue seems reproduceable when it reaches two 8.8GB large files to dedupe.

the function that seems to be stuck in a loop is dedupe_extents before the btrfs_extent>same call the values of ctxt are:

ctxt->ioctl_file->fd = 3
ctxt->same->info {
fd = 4,
logical_offset = 85563016
bytes>deduped = 0
status = 0
}

after the call the fields are unchanged

this ctxt then seems to be inserted into the list again in process_dedupes and repeats until the machine is out of RAM. but it is not duperemove that uses the ram, it seems to be lost inside the kernel itself.

juliantaylor commented 9 years ago

ctx->same is

logical_offset =85563016
length = 0
dest_count = 1

the length 0 seems to be the cause of the endless loop as it will cause status = 0 and bytes_deduped = 0

markfasheh commented 9 years ago

Question - was this working for you previously? I wonder if it was a recent change I made to dedupe.c, 9430adc6b20667cff72bc2cf31d7621b7ace76bb

EDIT: just to keep note here, I tried it with a pair of 1 gigabyte files and things went ok, I'll try with some larger ones on a different kernel (this one has my fixes from the btrfs mailing list)

juliantaylor commented 9 years ago

could be that the endless loop is new, but I had duperemove leak kernel memory but not as much before but couldn't pinpoint it. Possibly this is the same issue. Perhaps also related to gh-42?

juliantaylor commented 9 years ago

found the cause of the length 0, set_aligned_same_length is wrong when the filesize is a larger than 4gb it sets the len to zero as the mask is a 32 bit integer (fs blocksize is 4096)

that explains why it goes into an endless loop, but I still have no clue about the leak can you try with a large file and the ctxt->len set to zero (so reproducing the the endless loop)

juliantaylor commented 9 years ago

is there a kfree of same missing in btrfs_ioctl_file_extent_same?

markfasheh commented 9 years ago

Huh, I believe so - that's a nice catch. I'll check it out and if it is a leak indeed the fix is pretty easy.

markfasheh commented 9 years ago

I have a potential fix for the endless loop scenario in issue#83 branch, would you mind giving it a go?

juliantaylor commented 9 years ago

the fix works and it now finishes but of course still leaks memory.

markfasheh commented 9 years ago

Thanks, yeah the memory leak is going to require a kernel patch I'll update with details as I get them.

juliantaylor commented 9 years ago

a patch has been posted on the list: http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg44488.html

markfasheh commented 9 years ago

Thanks for the pointer, I went ahead and sent him a review of the patch.

juliantaylor commented 9 years ago

I'd recommend to merge that simple duperemove change soon, as it can kill a machine if you encounter a files sized by multiples of 4gb

markfasheh commented 9 years ago

It's been merged into master branch, do you not see the fix working for you? (this isn't a fix for the kernel memory leak of course)

0a9771f59daba95bae6ead2cafccdf0205279c88

markfasheh commented 8 years ago

Closing as this all should be fixed upstream and in duperemove.git now