Closed trofi closed 10 months ago
If I run it in a single thread with --io-threads=1 --cpu-threads=1
I get ther assert failure in run_dedupe.c:287
:
...
[0x7ffff0000b90] (0016/1626) Try to dedupe extents with id 0f49c355
[0x7ffff0000b90] Dedupe 1 extents (id: 0f49c355) with target: (0, 3243), "/nix/store/.links/12fmw1zahji9vkqy1b0mkwgad8s37lhryhd5f7z9gm2j114pxwra"
[0x7ffff0000b90] (0017/1626) Try to dedupe extents with id 568eb2f1
[0x7ffff0000b90] Dedupe 1 extents (id: 568eb2f1) with target: (0, 3245), "/nix/store/.links/0nwf3ywxdmn9c16s3y2djb52wr7qz55hmvnxa2q9kvgzgad262xd"
ERROR: run_dedupe.c:287
[stack trace follows]
/mnt/archive/big/git/duperemove/duperemove(print_stack_trace+0x29) [0x409b39]
/mnt/archive/big/git/duperemove/duperemove() [0x40d844]
/nix/store/6bpc4sc5apc2ryxhjyij43n3wi169hfn-glib-2.76.4/lib/libglib-2.0.so.0(+0x8ad72) [0x7ffff7f04d72]
/nix/store/6bpc4sc5apc2ryxhjyij43n3wi169hfn-glib-2.76.4/lib/libglib-2.0.so.0(+0x8a41d) [0x7ffff7f0441d]
/nix/store/znf2bj54q3qj4pyk0xbp7pk12xbxq07b-glibc-2.38-27/lib/libc.so.6(+0x908b1) [0x7ffff7ae48b1]
/nix/store/znf2bj54q3qj4pyk0xbp7pk12xbxq07b-glibc-2.38-27/lib/libc.so.6(+0x114e6c) [0x7ffff7b68e6c]
I wrote a duperemove-fuzz.bash
fuzzer that crashes duperemove
within 10 seconds for me:
#!/usr/bin/env bash
duperemove_binary=$1
target_dir=$2
shift; shift
if [[ -z $duperemove_binary ]] || [[ -z $target_dir ]]; then
echo "Usage: $0 </abs/path/to/duperemove> <directory> [duperemove opts]"
exit 1
fi
# fail on any error
set -e
mkdir "$target_dir"
cd "$target_dir"
shopt -s nullglob
while :; do
sync
files=(*)
f_count=${#files[@]}
dst=$f_count
case $((RANDOM % 4)) in
0) # copy existing file
[[ $f_count -eq 0 ]] && continue
cp_arg=""
case $((RANDOM % 2)) in
0) cp_arg=--reflink=always;;
1) cp_arg=--reflink=never;;
esac
src=$((RANDOM % f_count))
cp -v "$cp_arg" "$src" "$dst"
;;
1) # create new file of 4x4KB distinct blocks
printf "0%*d" 4095 "$dst" > "$dst"
printf "1%*d" 4095 "$dst" >> "$dst"
printf "2%*d" 4095 "$dst" >> "$dst"
printf "3%*d" 4095 "$dst" >> "$dst"
;;
2) # run duperemove
"$duperemove_binary" "$@" -rd -b 4096 "$target_dir"
;;
3) # dd 4KB of one file into another
[[ $f_count -eq 0 ]] && continue
src=$((RANDOM % f_count))
dst=$((RANDOM % f_count))
[[ $src = $dst ]] && continue
src_block=$((RANDOM % 3))
dst_block=$((RANDOM % 3))
dd "if=$src" "iseek=$src_block" "of=$dst" "oseek=$dst_block" bs=4096 count=1
;;
esac
done
Using:
$ time { rm -rfv ~/tmp/dr/ && bash ./duperemove-fuzz.bash $PWD/duperemove/duperemove ~/tmp/dr -q; }
...
4096 bytes (4.1 kB, 4.0 KiB) copied, 8.7406e-05 s, 46.9 MB/s
'90' -> '189'
Simple read and compare of file data found 29 instances of files that might benefit from deduplication.
ERROR: run_dedupe.c:287
[stack trace follows]
/home/slyfox/dev/git/duperemove/duperemove(print_stack_trace+0x29) [0x409b39]
/home/slyfox/dev/git/duperemove/duperemove() [0x40d844]
/nix/store/6bpc4sc5apc2ryxhjyij43n3wi169hfn-glib-2.76.4/lib/libglib-2.0.so.0(+0x8ad72) [0x7f0def29ad72]
/nix/store/6bpc4sc5apc2ryxhjyij43n3wi169hfn-glib-2.76.4/lib/libglib-2.0.so.0(+0x8a41d) [0x7f0def29a41d]
/nix/store/znf2bj54q3qj4pyk0xbp7pk12xbxq07b-glibc-2.38-27/lib/libc.so.6(+0x908b1) [0x7f0deee7a8b1]
/nix/store/znf2bj54q3qj4pyk0xbp7pk12xbxq07b-glibc-2.38-27/lib/libc.so.6(+0x114e6c) [0x7f0deeefee6c]
./duperemove-fuzz.bash: line 27: 2598070 Aborted (core dumped) "$duperemove_binary" "$@" -rd -b 4096 "$target_dir"
real 0m5.790s
user 0m0.993s
sys 0m2.831s
Hello @trofi I cannot reproduce this behavior, either from my live systems or using your script
Could you give you your kernel version and btrfs-utils version used to create the filesystem ?
Thank you
I wonder if it's related to the compression settings I use. The following seems to reproduce failure for me:
$ uname -a
Linux nz 6.6.2 #1-NixOS SMP PREEMPT_DYNAMIC Mon Nov 20 10:59:39 UTC 2023 x86_64 GNU/Linux
$ btrfs --version
btrfs-progs v6.6.2
$ cd /tmp/
$ fallocate -l 10G btrfs.img
$ mkfs.btrfs btrfs.img
btrfs-progs v6.6.2
See https://btrfs.readthedocs.io for more information.
NOTE: several default settings have changed in version 5.15, please make sure
this does not affect your deployments:
- DUP for metadata (-m dup)
- enabled no-holes (-O no-holes)
- enabled free-space-tree (-R free-space-tree)
Label: (null)
UUID: a54fa315-268b-4820-bd9d-52eb263d23b6
Node size: 16384
Sector size: 4096
Filesystem size: 10.00GiB
Block group profiles:
Data: single 8.00MiB
Metadata: DUP 256.00MiB
System: DUP 8.00MiB
SSD detected: no
Zoned device: no
Incompat features: extref, skinny-metadata, no-holes, free-space-tree
Runtime features: free-space-tree
Checksum: crc32c
Number of devices: 1
Devices:
ID SIZE PATH
1 10.00GiB btrfs.img
$ sudo mount -onoatime,compress=zstd btrfs.img m/
$ sudo chown $(whoami) m/
$ time ./duperemove-fuzz.bash ~/dev/git/duperemove/duperemove $PWD/m/dr -q
...
Simple read and compare of file data found 34 instances of files that might benefit from deduplication.
ERROR: run_dedupe.c:287
[stack trace follows]
/home/slyfox/dev/git/duperemove/duperemove(print_stack_trace+0x29) [0x409b39]
/home/slyfox/dev/git/duperemove/duperemove() [0x40d844]
/nix/store/6bpc4sc5apc2ryxhjyij43n3wi169hfn-glib-2.76.4/lib/libglib-2.0.so.0(+0x8ad72) [0x7f769cc36d72]
/nix/store/6bpc4sc5apc2ryxhjyij43n3wi169hfn-glib-2.76.4/lib/libglib-2.0.so.0(+0x8a41d) [0x7f769cc3641d]
/nix/store/znf2bj54q3qj4pyk0xbp7pk12xbxq07b-glibc-2.38-27/lib/libc.so.6(+0x908b1) [0x7f769c8168b1]
/nix/store/znf2bj54q3qj4pyk0xbp7pk12xbxq07b-glibc-2.38-27/lib/libc.so.6(+0x114e6c) [0x7f769c89ae6c]
./duperemove-fuzz.bash: line 27: 3607680 Aborted (core dumped) "$duperemove_binary" "$@" -rd -b 4096 "$target_dir"
real 0m2.220s
user 0m0.872s
sys 0m1.107s
With compression enabled, I did reproduce This commit would fix the issue
Could you check ? Thank you
The change fixes all the assert()
related crashes for me. Thank you!
Good news ! I have merged to commit, thank you for your report
Ran
duperemove
from currentmaster
at 296d43a208be82e3ae2473ba51d15f6e025f0c80Merge pull request #326 from autumnontape/cfi-icall
, fails as: