openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.42k stars 1.72k forks source link

when delete lots of files zfs system slow #13707

Open mamh2021 opened 2 years ago

mamh2021 commented 2 years ago

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version 20.04.1 LTS
Kernel Version 5.4.0-105-generic
Architecture x86_64
OpenZFS Version zfs-0.8.3-1ubuntu12.12 zfs-kmod-0.8.3-1ubuntu12.13

Describe the problem you're observing

when delete lots of files zfs system slow, the samba cannot open, and cannot run ls command under the zpool path at this time the txg_sync process is busy. the rm process stauts is always D the smbd process status is always D

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

delete the files under data pool

NAME                         SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
data                         750T   693T  57.4T        -         -    22%    92%  1.15x    ONLINE  -
  2d602b6101_lun_b01_part1   100T  99.1T  1.11T        -         -    45%  98.9%      -  ONLINE
  c7602b6101_lun_b02_part1   100T  89.6T  10.6T        -         -    13%  89.4%      -  ONLINE
  15612b6101_lun_b03_part1   100T  99.2T  1.11T        -         -    53%  98.9%      -  ONLINE
  3a67826101_lun_a01_part1   113T   108T  4.33T        -         -    17%  96.2%      -  ONLINE
  3b67826101_lun_a02_part1   113T   104T  8.74T        -         -    14%  92.2%      -  ONLINE
  3c67826101_lun_a03_part1   112T   104T  7.36T        -         -    14%  93.4%      -  ONLINE
  3e67826101_lun_a04_part1   112T  87.7T  24.1T        -         -     5%  78.4%      -  ONLINE
logs                            -      -      -        -         -      -      -      -  -
  f32b309a5f7c_ssd_log      6.98T   128K  6.98T        -         -     0%  0.00%      -  ONLINE
cache                           -      -      -        -         -      -      -      -  -
  f2e45c6330dc_ssd_cache    34.9T  34.9T  61.6G        -         -     0%  99.8%      -  ONLINE
odvb                        72.8T  44.5T  28.2T        -         -     1%    61%  1.00x    ONLINE  -
  44e6c2cb1edb_odvb_part1   72.8T  44.5T  28.2T        -         -     1%  61.2%      -  ONLINE
  pool: data
 state: ONLINE
  scan: scrub repaired 0B in 14 days 09:23:08 with 0 errors on Sun Jul 24 09:47:09 2022
config:

        NAME                        STATE     READ WRITE CKSUM
        data                        ONLINE       0     0     0
          2d602b6101_lun_b01_part1  ONLINE       0     0     0
          c7602b6101_lun_b02_part1  ONLINE       0     0     0
          15612b6101_lun_b03_part1  ONLINE       0     0     0
          3a67826101_lun_a01_part1  ONLINE       0     0     0
          3b67826101_lun_a02_part1  ONLINE       0     0     0
          3c67826101_lun_a03_part1  ONLINE       0     0     0
          3e67826101_lun_a04_part1  ONLINE       0     0     0
        logs
          f32b309a5f7c_ssd_log      ONLINE       0     0     0
        cache
          f2e45c6330dc_ssd_cache    ONLINE       0     0     0

errors: No known data errors

  pool: odvb
 state: ONLINE
  scan: scrub repaired 0B in 1 days 01:56:11 with 0 errors on Mon Jul 11 02:20:13 2022
config:

        NAME                       STATE     READ WRITE CKSUM
        odvb                       ONLINE       0     0     0
          44e6c2cb1edb_odvb_part1  ONLINE       0     0     0
rincebrain commented 2 years ago

I would suggest attempting to reproduce this on an OpenZFS version newer than January 2020, or reporting the issue to Ubuntu, since they're the ones shipping a release that hasn't gotten upstream patches since December 2020.

I believe, though I don't use Ubuntu super regularly any more, that options for doing this include the Ubuntu -backports repos, the HWE kernels include the newer ZFS they come with bundled in (though that only updates your kernel not userland), jonathonf's PPA for ZFS, or of course building your own packages from source.

e: I also just noticed you're using dedup, naturally without a special vdev since 0.8.x doesn't have those. That's going to be very, very slow.

mamh2021 commented 2 years ago

at first the dedup is on but when found this issue I disable dedup

$ zfs get all
NAME         PROPERTY              VALUE                  SOURCE
data         type                  filesystem             -
data         creation              Sat Sep 25 12:19 2021  -
data         used                  707T                   -
data         available             34.2T                  -
data         referenced            96K                    -
data         compressratio         1.00x                  -
data         mounted               no                     -
data         quota                 none                   default
data         reservation           none                   default
data         recordsize            128K                   default
data         mountpoint            none                   local
data         sharenfs              off                    default
data         checksum              off                    local
data         compression           off                    local
data         atime                 on                     default
data         devices               on                     default
data         exec                  on                     default
data         setuid                on                     default
data         readonly              off                    default
data         zoned                 off                    default
data         snapdir               hidden                 default
data         aclinherit            restricted             default
data         createtxg             1                      -
data         canmount              on                     default
data         xattr                 on                     default
data         copies                1                      default
data         version               5                      -
data         utf8only              off                    -
data         normalization         none                   -
data         casesensitivity       sensitive              -
data         vscan                 off                    default
data         nbmand                off                    default
data         sharesmb              off                    default
data         refquota              none                   default
data         refreservation        none                   default
data         guid                  9866093177391570530    -
data         primarycache          all                    default
data         secondarycache        all                    default
data         usedbysnapshots       0B                     -
data         usedbydataset         96K                    -
data         usedbychildren        707T                   -
data         usedbyrefreservation  0B                     -
data         logbias               latency                default
data         objsetid              54                     -
data         dedup                 off                    local
data         mlslabel              none                   default
data         sync                  disabled               local
data         dnodesize             legacy                 default
data         refcompressratio      1.00x                  -
data         written               96K                    -
data         logicalused           710T                   -
data         logicalreferenced     42K                    -
data         volmode               default                default
data         filesystem_limit      none                   default
data         snapshot_limit        none                   default
data         filesystem_count      none                   default
data         snapshot_count        none                   default
data         snapdev               hidden                 default
data         acltype               off                    default
data         context               none                   default
data         fscontext             none                   default
data         defcontext            none                   default
data         rootcontext           none                   default
data         relatime              off                    default
data         redundant_metadata    all                    default
data         overlay               off                    default
data         encryption            off                    default
data         keylocation           none                   default
data         keyformat             none                   default
data         pbkdf2iters           0                      default
data         special_small_blocks  0                      default
rincebrain commented 2 years ago

Turning off dedup doesn't make it instantly go away without deleting all the data written while it was on, that's one reason we advise being cautious before enabling it.

mamh2021 commented 2 years ago
QQ拼音截图20220731101948 2
rincebrain commented 2 years ago

I also just noticed you have attempted to have checksums turned off, which wouldn't have worked while you had dedup on anyway since dedup overrides the checksum setting, and doesn't stop it checksumming things written while the setting was not "off" (which is, generally, an extremely bad idea), and was probably therefore using the slowest checksum option ZFS currently has at the time.

In any case, my advice remains "run a version that is supported by upstream at this point and see if the performance is still poor", or if you can't or won't, "it's theoretically more effective to report bugs to Ubuntu than upstream for versions that are no longer maintained".

(Also, my memory was wrong, 0.8 did have special vdevs, so that likely would have been a good option for you if you wanted to leverage dedup on this pool and had SSDs, possibly better than a slog or cache device depending on the goals, and also would have made the deletion you're lamenting being quite slow much faster.)

stale[bot] commented 1 year ago

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

wiesl commented 1 month ago

Made the following reproduceable tests with actual Linux kernel and actual plain vanilla zfs version with the following script and can confirm a performance issue on deleting a lot of files:

#!/usr/bin/env bash

BASE_DIR=/testzfs/dirstructure
#BASE_DIR=/root/dirstructure

OS=`uname`
DDPARAMS=

if [ ! -z "${1}" ]; then
  BASE_DIR=${1}
fi

if [ ! "${OS}" == "FreeBSD" ]; then
  DDPARAMS="oflag=nonblock"
fi

DIR1=100
DIR2=100
#DIR1=10
#DIR2=10
FILES=100

mkdir -p "${BASE_DIR}"

for (( d1 = 0; d1 < ${DIR1}; d1++ )); do
  for (( d2 = 0; d2 < ${DIR2}; d2++ )); do
    DIR=${BASE_DIR}/${d1}/${d2}
    echo "${DIR}"
    mkdir -p "${DIR}"
    for (( f = 0; f < ${FILES}; f++ )); do
      FILE=${DIR}/${f}.bin
      dd if=/dev/urandom of="${FILE}" bs=4096 count=1 status=none ${DDPARAMS}
    done
  done
done

results with: zfs --version zfs-2.2.4-1 zfs-kmod-2.2.4-1

uname -a Linux myserver 6.9.8-200.fc40.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Jul 5 16:20:11 UTC 2024 x86_64 GNU/Linux

Tests were done with ext4, btrfs and zfs, /root is on ext4. Deletion on ext4 and btrfs is ok, zfs is around factor 22 slower.

Tests are done on a KVM VM with underlying SSD only system.

################################################################################
time ./performance_make_folder_file_structure.sh /root/dirstructure
real    13m57.033s
user    6m40.366s
sys     7m23.004s

time ./performance_make_folder_file_structure.sh /testbtrfs/dirstructure
real    14m1.768s
user    6m36.889s
sys     7m29.375s

time ./performance_make_folder_file_structure.sh /testzfs/dirstructure
real    16m14.611s
user    6m38.907s
sys     8m1.397s
################################################################################
time rm -rf /root/dirstructure
real    0m19.359s
user    0m0.883s
sys     0m10.819s

time rm -rf /testbtrfs/dirstructure
real    0m34.683s
user    0m1.147s
sys     0m24.004s

time rm -rf /testzfs/dirstructure
real    7m1.059s
user    0m2.876s
sys     1m29.455s
################################################################################