openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.69k stars 1.76k forks source link

OOM after files remove with dedup on and fast dedup enabled #16697

Closed jtblck90 closed 2 weeks ago

jtblck90 commented 1 month ago

System information

Type Version/Name
Distribution Name Debian
Distribution Version 12
Kernel Version 6.1.0-22
Architecture amd64
OpenZFS Version zfs-2.3.0-rc2

A testing VM has 64GB of RAM and 32GB of RAM set for ZFS ARC via min and max parameters. RAIDZ1 pool with is configured with deduplication and fast dedup feature is enabled and active.

zpool status

  pool: zpool16k
 state: ONLINE
config:

    NAME         STATE     READ WRITE CKSUM
    zpool16k     ONLINE       0     0     0
      raidz1-0   ONLINE       0     0     0
        nvme0n1  ONLINE       0     0     0
        nvme1n1  ONLINE       0     0     0
        nvme2n1  ONLINE       0     0     0
        nvme3n1  ONLINE       0     0     0
        nvme4n1  ONLINE       0     0     0

errors: No known data errors

zpool config

NAME      PROPERTY                       VALUE                          SOURCE
zpool16k  size                           14.5T                          -
zpool16k  capacity                       84%                            -
zpool16k  altroot                        -                              default
zpool16k  health                         ONLINE                         -
zpool16k  guid                           6048011872382422538            -
zpool16k  version                        -                              default
zpool16k  bootfs                         -                              default
zpool16k  delegation                     on                             default
zpool16k  autoreplace                    off                            default
zpool16k  cachefile                      -                              default
zpool16k  failmode                       wait                           default
zpool16k  listsnapshots                  off                            default
zpool16k  autoexpand                     off                            default
zpool16k  dedupratio                     1.00x                          -
zpool16k  free                           2.23T                          -
zpool16k  allocated                      12.3T                          -
zpool16k  readonly                       off                            -
zpool16k  ashift                         12                             local
zpool16k  comment                        -                              default
zpool16k  expandsize                     -                              -
zpool16k  freeing                        0                              -
zpool16k  fragmentation                  1%                             -
zpool16k  leaked                         0                              -
zpool16k  multihost                      off                            default
zpool16k  checkpoint                     -                              -
zpool16k  load_guid                      15783377096588126353           -
zpool16k  autotrim                       off                            default
zpool16k  compatibility                  off                            default
zpool16k  bcloneused                     0                              -
zpool16k  bclonesaved                    0                              -
zpool16k  bcloneratio                    1.00x                          -
zpool16k  dedup_table_size               212G                           -
zpool16k  dedup_table_quota              auto                           default
zpool16k  feature@async_destroy          enabled                        local
zpool16k  feature@empty_bpobj            enabled                        local
zpool16k  feature@lz4_compress           active                         local
zpool16k  feature@multi_vdev_crash_dump  enabled                        local
zpool16k  feature@spacemap_histogram     active                         local
zpool16k  feature@enabled_txg            active                         local
zpool16k  feature@hole_birth             active                         local
zpool16k  feature@extensible_dataset     active                         local
zpool16k  feature@embedded_data          active                         local
zpool16k  feature@bookmarks              enabled                        local
zpool16k  feature@filesystem_limits      enabled                        local
zpool16k  feature@large_blocks           enabled                        local
zpool16k  feature@large_dnode            enabled                        local
zpool16k  feature@sha512                 enabled                        local
zpool16k  feature@skein                  enabled                        local
zpool16k  feature@edonr                  enabled                        local
zpool16k  feature@userobj_accounting     active                         local
zpool16k  feature@encryption             enabled                        local
zpool16k  feature@project_quota          active                         local
zpool16k  feature@device_removal         enabled                        local
zpool16k  feature@obsolete_counts        enabled                        local
zpool16k  feature@zpool_checkpoint       enabled                        local
zpool16k  feature@spacemap_v2            active                         local
zpool16k  feature@allocation_classes     enabled                        local
zpool16k  feature@resilver_defer         enabled                        local
zpool16k  feature@bookmark_v2            enabled                        local
zpool16k  feature@redaction_bookmarks    enabled                        local
zpool16k  feature@redacted_datasets      enabled                        local
zpool16k  feature@bookmark_written       enabled                        local
zpool16k  feature@log_spacemap           active                         local
zpool16k  feature@livelist               enabled                        local
zpool16k  feature@device_rebuild         enabled                        local
zpool16k  feature@zstd_compress          enabled                        local
zpool16k  feature@draid                  enabled                        local
zpool16k  feature@zilsaxattr             enabled                        local
zpool16k  feature@head_errlog            active                         local
zpool16k  feature@blake3                 enabled                        local
zpool16k  feature@block_cloning          enabled                        local
zpool16k  feature@vdev_zaps_v2           active                         local
zpool16k  feature@redaction_list_spill   enabled                        local
zpool16k  feature@raidz_expansion        enabled                        local
zpool16k  feature@fast_dedup             active                         local
zpool16k  feature@longname               enabled                        local
zpool16k  feature@large_microzap         enabled                        local

zfs config

NAME      PROPERTY              VALUE                  SOURCE
zpool16k  type                  filesystem             -
zpool16k  creation              Mon Oct 28 13:24 2024  -
zpool16k  used                  9.84T                  -
zpool16k  available             1.66T                  -
zpool16k  referenced            9.63T                  -
zpool16k  compressratio         1.00x                  -
zpool16k  mounted               yes                    -
zpool16k  quota                 none                   default
zpool16k  reservation           none                   default
zpool16k  recordsize            16K                    local
zpool16k  mountpoint            /zpool16k              default
zpool16k  sharenfs              off                    default
zpool16k  checksum              on                     default
zpool16k  compression           off                    local
zpool16k  atime                 on                     default
zpool16k  devices               on                     default
zpool16k  exec                  on                     default
zpool16k  setuid                on                     default
zpool16k  readonly              off                    default
zpool16k  zoned                 off                    default
zpool16k  snapdir               hidden                 default
zpool16k  aclmode               discard                default
zpool16k  aclinherit            restricted             default
zpool16k  createtxg             1                      -
zpool16k  canmount              on                     default
zpool16k  xattr                 on                     local
zpool16k  copies                1                      default
zpool16k  version               5                      -
zpool16k  utf8only              on                     -
zpool16k  normalization         formD                  -
zpool16k  casesensitivity       sensitive              -
zpool16k  vscan                 off                    default
zpool16k  nbmand                off                    default
zpool16k  sharesmb              off                    default
zpool16k  refquota              none                   default
zpool16k  refreservation        none                   default
zpool16k  guid                  3860442583779050184    -
zpool16k  primarycache          all                    default
zpool16k  secondarycache        all                    default
zpool16k  usedbysnapshots       0B                     -
zpool16k  usedbydataset         9.63T                  -
zpool16k  usedbychildren        212G                   -
zpool16k  usedbyrefreservation  0B                     -
zpool16k  logbias               latency                default
zpool16k  objsetid              54                     -
zpool16k  dedup                 on                     local
zpool16k  mlslabel              none                   default
zpool16k  sync                  disabled               local
zpool16k  dnodesize             legacy                 default
zpool16k  refcompressratio      1.00x                  -
zpool16k  written               9.63T                  -
zpool16k  logicalused           8.08T                  -
zpool16k  logicalreferenced     8.02T                  -
zpool16k  volmode               default                default
zpool16k  filesystem_limit      none                   default
zpool16k  snapshot_limit        none                   default
zpool16k  filesystem_count      none                   default
zpool16k  snapshot_count        none                   default
zpool16k  snapdev               hidden                 default
zpool16k  acltype               posix                  local
zpool16k  context               none                   default
zpool16k  fscontext             none                   default
zpool16k  defcontext            none                   default
zpool16k  rootcontext           none                   default
zpool16k  relatime              on                     local
zpool16k  redundant_metadata    all                    default
zpool16k  overlay               on                     default
zpool16k  encryption            off                    default
zpool16k  keylocation           none                   default
zpool16k  keyformat             none                   default
zpool16k  pbkdf2iters           0                      default
zpool16k  special_small_blocks  0                      default
zpool16k  prefetch              all                    default
zpool16k  direct                standard               default
zpool16k  longname              off                    default

zpool status with DDT (zpool status -D)

  pool: zpool16k
 state: ONLINE
config:

    NAME         STATE     READ WRITE CKSUM
    zpool16k     ONLINE       0     0     0
      raidz1-0   ONLINE       0     0     0
        nvme0n1  ONLINE       0     0     0
        nvme1n1  ONLINE       0     0     0
        nvme2n1  ONLINE       0     0     0
        nvme3n1  ONLINE       0     0     0
        nvme4n1  ONLINE       0     0     0

errors: No known data errors

 dedup: DDT entries 536870912, size 212G on disk, 136G in core

bucket              allocated                       referenced          
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
     1     512M      8T      8T   9.59T     512M      8T      8T   9.59T
 Total     512M      8T      8T   9.59T     512M      8T      8T   9.59T

arcstat

21 1 0x01 147 39984 64501438472 54609823965886
name                            type data
hits                            4    3178411426
iohits                          4    1496
misses                          4    64419903
demand_data_hits                4    0
demand_data_iohits              4    0
demand_data_misses              4    0
demand_metadata_hits            4    3178410242
demand_metadata_iohits          4    1496
demand_metadata_misses          4    64418378
prefetch_data_hits              4    0
prefetch_data_iohits            4    0
prefetch_data_misses            4    0
prefetch_metadata_hits          4    1184
prefetch_metadata_iohits        4    0
prefetch_metadata_misses        4    1525
mru_hits                        4    225453614
mru_ghost_hits                  4    11470786
mfu_hits                        4    2952957812
mfu_ghost_hits                  4    18958526
uncached_hits                   4    0
deleted                         4    572151552
mutex_miss                      4    4315505
access_skip                     4    0
evict_skip                      4    5
evict_not_enough                4    457326
evict_l2_cached                 4    0
evict_l2_eligible               4    11090623758336
evict_l2_eligible_mfu           4    1735208350208
evict_l2_eligible_mru           4    9355415408128
evict_l2_ineligible             4    8192
evict_l2_skip                   4    0
hash_elements                   4    3515346
hash_elements_max               4    3754460
hash_collisions                 4    175363571
hash_chains                     4    562112
hash_chain_max                  4    9
meta                            4    4294389235
pd                              4    2147483648
pm                              4    1925069853
c                               4    34359738368
c_min                           4    34359738368
c_max                           4    34359738368
size                            4    34373397632
compressed_size                 4    31970503680
uncompressed_size               4    68214341120
overhead_size                   4    1515084288
hdr_size                        4    844770080
data_size                       4    147505152
metadata_size                   4    33338082816
dbuf_size                       4    13324032
dnode_size                      4    2105824
bonus_size                      4    310400
anon_size                       4    0
anon_data                       4    0
anon_metadata                   4    0
anon_evictable_data             4    0
anon_evictable_metadata         4    0
mru_size                        4    17549727232
mru_data                        4    147505152
mru_metadata                    4    17402222080
mru_evictable_data              4    147505152
mru_evictable_metadata          4    15175630848
mru_ghost_size                  4    24628658176
mru_ghost_data                  4    16587096064
mru_ghost_metadata              4    8041562112
mru_ghost_evictable_data        4    16587096064
mru_ghost_evictable_metadata    4    8041562112
mfu_size                        4    15935860736
mfu_data                        4    0
mfu_metadata                    4    15935860736
mfu_evictable_data              4    0
mfu_evictable_metadata          4    15934646272
mfu_ghost_size                  4    6871023616
mfu_ghost_data                  4    0
mfu_ghost_metadata              4    6871023616
mfu_ghost_evictable_data        4    0
mfu_ghost_evictable_metadata    4    6871023616
uncached_size                   4    0
uncached_data                   4    0
uncached_metadata               4    0
uncached_evictable_data         4    0
uncached_evictable_metadata     4    0
l2_hits                         4    0
l2_misses                       4    0
l2_prefetch_asize               4    0
l2_mru_asize                    4    0
l2_mfu_asize                    4    0
l2_bufc_data_asize              4    0
l2_bufc_metadata_asize          4    0
l2_feeds                        4    0
l2_rw_clash                     4    0
l2_read_bytes                   4    0
l2_write_bytes                  4    0
l2_writes_sent                  4    0
l2_writes_done                  4    0
l2_writes_error                 4    0
l2_writes_lock_retry            4    0
l2_evict_lock_retry             4    0
l2_evict_reading                4    0
l2_evict_l1cached               4    0
l2_free_on_write                4    0
l2_abort_lowmem                 4    0
l2_cksum_bad                    4    0
l2_io_error                     4    0
l2_size                         4    0
l2_asize                        4    0
l2_hdr_size                     4    0
l2_log_blk_writes               4    0
l2_log_blk_avg_asize            4    0
l2_log_blk_asize                4    0
l2_log_blk_count                4    0
l2_data_to_meta_ratio           4    0
l2_rebuild_success              4    0
l2_rebuild_unsupported          4    0
l2_rebuild_io_errors            4    0
l2_rebuild_dh_errors            4    0
l2_rebuild_cksum_lb_errors      4    0
l2_rebuild_lowmem               4    0
l2_rebuild_size                 4    0
l2_rebuild_asize                4    0
l2_rebuild_bufs                 4    0
l2_rebuild_bufs_precached       4    0
l2_rebuild_log_blks             4    0
memory_throttle_count           4    0
memory_direct_count             4    0
memory_indirect_count           4    0
memory_all_bytes                4    67418071040
memory_free_bytes               4    15142215680
memory_available_bytes          3    12786138880
arc_no_grow                     4    0
arc_tempreserve                 4    0
arc_loaned_bytes                4    0
arc_prune                       4    0
arc_meta_used                   4    34198593152
arc_dnode_limit                 4    3435973836
async_upgrade_sync              4    0
predictive_prefetch             4    2709
demand_hit_predictive_prefetch  4    1180
demand_iohit_predictive_prefetch 4    1507
prescient_prefetch              4    0
demand_hit_prescient_prefetch   4    0
demand_iohit_prescient_prefetch 4    0
arc_need_free                   4    0
arc_sys_free                    4    2356076800
arc_raw_size                    4    0
cached_only_in_progress         4    0
abd_chunk_waste_size            4    27299328

Describe the problem you're observing

When writing large files which in my test was 4 files 2TB each (8TB total) on a zpool with dedup enabled and fast dedup feature active, all ARC is used and total RAM consumption sits at around 47GB. When deleting the files, RAM usage grows and the system goes into OOM. This can be reproduced on other recordsizes as well (tested with 16K and 128K recordsize). Also, with lower amount of data and lower amount of RAM. Same can be observed with lots of small files but with equal total space occupied on zpool. If removing small files one by one, they can be deleted, but when attempting to remove lots of 1GB files simultaneously, this results in OOM. After the reset, zpool cannot be imported resulting in the same OOM condition.

Describe how to reproduce the problem

Write several large files on zpool with deduplication and fast dedup enabled. In my experiment, this was 4x2TB files. Total RAM - 64GB. Or, 4x1TB files but with lower amount of RAM (32GB). Try to remove the files with rm

Include any warning/errors/backtraces from the system logs

I cannot find the OOM messages after the reset in journal so attaching the screenshot here.

OOMscreen

From the journal log, I see the following events:

Oct 29 04:45:17 zfs-rc2-test kernel: Large kmem_alloc(74904, 0x1000), please file an issue at: https://github.com/openzfs/zfs/issues/new

Attaching full journal logs and dmesg logs just in case. log.txt dmesg.txt

robn commented 1 month ago

Likely duplicate #6783 and #16037.

Can you post /proc/spl/kmem/slab from before and after the OOM event? Doesn't need to be exact, but I'd like to see what happens as more files are deleted, into the kernel attempting to reclaim memory, before finally giving up and killing things.

jtblck90 commented 1 month ago

@robn yeah, saw those issues but decided to post on a fresh rc2. I will post /proc/spl/kmem/slab shortly.

jtblck90 commented 1 month ago

@robn please find the /proc/spl/kmem/slab output below. Also captured slab before starting files removal.

slab before starting files removal:

--------------------- cache -------------------------------------------------------  ----- slab ------  ---- object -----  --- emergency ---
name                                    flags      size     alloc slabsize  objsize  total alloc   max  total alloc   max  dlock alloc   max
spl_zlib_workspace_cache              0x200080         0         0  2145216   268104      0     0     0      0     0     0      0     0     0
kcf_context_cache                     0x00100         -         0        -       40      -     -     -      -     0     -      -     -     -
zfs_btree_leaf_cache                  0x00100         -  10604544        -     4096      -     -     -      -  2589     -      -     -     -
metaslab_alloc_trace_cache            0x00100         -         0        -       72      -     -     -      -     0     -      -     -     -
brt_entry_cache                       0x00100         -         0        -       40      -     -     -      -     0     -      -     -     -
brt_pending_entry_cache               0x00100         -         0        -      160      -     -     -      -     0     -      -     -     -
ddt_cache                             0x00080    468992    234112   234496    29264      2     1     1     16     8     8      0     0     0
ddt_entry_flat_cache                  0x00100         -         0        -      240      -     -     -      -     0     -      -     -     -
ddt_entry_trad_cache                  0x00100         -         0        -      424      -     -     -      -     0     -      -     -     -
ddt_log_entry_flat_cache              0x00100         - 15686543808        -      144      -     -     -      - 108934332     -      -     -     -
ddt_log_entry_trad_cache              0x00100         -         0        -      328      -     -     -      -     0     -      -     -     -
zio_cache                             0x00100         -     43776        -     1216      -     -     -      -    36     -      -     -     -
zio_link_cache                        0x00100         -         0        -       48      -     -     -      -     0     -      -     -     -
zio_buf_comb_512                      0x200102         -      8704        -      512      -     -     -      -    17     -      -     -     -
zio_buf_comb_1024                     0x200102         -         0        -     1024      -     -     -      -     0     -      -     -     -
zio_buf_comb_1536                     0x00102         -      1536        -     1536      -     -     -      -     1     -      -     -     -
zio_buf_comb_2048                     0x00102         -      2048        -     2048      -     -     -      -     1     -      -     -     -
zio_buf_comb_3072                     0x00102         -      3072        -     3072      -     -     -      -     1     -      -     -     -
zio_buf_comb_4096                     0x00102         -      8192        -     4096      -     -     -      -     2     -      -     -     -
zio_buf_comb_6144                     0x00102         -         0        -     6144      -     -     -      -     0     -      -     -     -
zio_buf_comb_8192                     0x00102         -         0        -     8192      -     -     -      -     0     -      -     -     -
zio_buf_comb_12288                    0x00102         -         0        -    12288      -     -     -      -     0     -      -     -     -
zio_buf_comb_16384                    0x00102         -   1441792        -    16384      -     -     -      -    88     -      -     -     -
zio_buf_comb_24576                    0x00082  11440128   9437184   233472    24576     49    48    48    392   384   384      0     0     0
zio_buf_comb_32768                    0x00082 1278558208 1004273664   299008    32768   4276  4276  4701  34208 30648 37608      0     0     0
zio_buf_comb_49152                    0x00082  25374720  22806528   430080    49152     59    58    58    472   464   464      0     0     0
zio_buf_comb_65536                    0x00082  18518016  16777216   561152    65536     33    32    32    264   256   256      0     0     0
zio_buf_comb_98304                    0x00082         0         0   823296    98304      0     0     0      0     0     0      0     0     0
zio_buf_comb_131072                   0x00082 688168960 577765376  1085440   131072    634   634   648   5072  4408  5184      0     0     0
zio_buf_comb_196608                   0x00082         0         0  1609728   196608      0     0     0      0     0     0      0     0     0
zio_buf_comb_262144                   0x00082         0         0  2134016   262144      0     0     0      0     0     0      0     0     0
zio_buf_comb_393216                   0x00082         0         0  3182592   393216      0     0     0      0     0     0      0     0     0
zio_buf_comb_524288                   0x00082         0         0  4231168   524288      0     0     0      0     0     0      0     0     0
zio_buf_comb_786432                   0x00082         0         0  6328320   786432      0     0     0      0     0     0      0     0     0
zio_buf_comb_1048576                  0x00082   8425472   2097152  8425472  1048576      1     1     1      8     2     2      0     0     0
zio_buf_comb_1572864                  0x00082         0         0 12619776  1572864      0     0     0      0     0     0      0     0     0
zio_buf_comb_2097152                  0x00082         0         0 16814080  2097152      0     0     0      0     0     0      0     0     0
zio_buf_comb_3145728                  0x00082         0         0 25202688  3145728      0     0     0      0     0     0      0     0     0
zio_buf_comb_4194304                  0x00082         0         0 29392896  4194304      0     0     0      0     0     0      0     0     0
zio_buf_comb_6291456                  0x00082         0         0 31481856  6291456      0     0     0      0     0     0      0     0     0
zio_buf_comb_8388608                  0x00082         0         0 25182208  8388608      0     0     0      0     0     0      0     0     0
zio_buf_comb_12582912                 0x00082         0         0 25178112 12582912      0     0     0      0     0     0      0     0     0
zio_buf_comb_16777216                 0x00082         0         0 16785408 16777216      0     0     0      0     0     0      0     0     0
lz4_cache                             0x200100         -         0        -    16384      -     -     -      -     0     -      -     -     -
abd_t                                 0x200100         - 201563520        -       96      -     -     -      - 2099620     -      -     -     -
sa_cache                              0x200100         -      1280        -      256      -     -     -      -     5     -      -     -     -
dnode_t                               0x200100         -   2197216        -      952      -     -     -      -  2308     -      -     -     -
arc_buf_hdr_t_full                    0x200100         - 862455360        -      240      -     -     -      - 3593564     -      -     -     -
arc_buf_hdr_t_l2only                  0x00100         -         0        -       96      -     -     -      -     0     -      -     -     -
arc_buf_t                             0x00100         -   1083168        -       32      -     -     -      - 33849     -      -     -     -
dmu_buf_impl_t                        0x00100         -  13400832        -      384      -     -     -      - 34898     -      -     -     -
zil_lwb_cache                         0x00100         -         0        -      392      -     -     -      -     0     -      -     -     -
zil_zcw_cache                         0x00100         -         0        -      152      -     -     -      -     0     -      -     -     -
sio_cache_0                           0x00100         -         0        -      136      -     -     -      -     0     -      -     -     -
sio_cache_1                           0x00100         -         0        -      152      -     -     -      -     0     -      -     -     -
sio_cache_2                           0x00100         -         0        -      168      -     -     -      -     0     -      -     -     -
zap_name                              0x00100         -         0        -      328      -     -     -      -     0     -      -     -     -
zap_attr_cache                        0x00100         -         0        -      288      -     -     -      -     0     -      -     -     -
zap_name_long                         0x00100         -         0        -     1096      -     -     -      -     0     -      -     -     -
zap_attr_long_cache                   0x00100         -         0        -     1056      -     -     -      -     0     -      -     -     -
zfs_znode_cache                       0x200100         -      6816        -     1136      -     -     -      -     6     -      -     -     -
zfs_znode_hold_cache                  0x00100         -         0        -       88      -     -     -      -     0     -      -     -     -

slab prior to OOM event:

--------------------- cache -------------------------------------------------------  ----- slab ------  ---- object -----  --- emergency ---
name                                    flags      size     alloc slabsize  objsize  total alloc   max  total alloc   max  dlock alloc   max
spl_zlib_workspace_cache              0x200080         0         0  2145216   268104      0     0     0      0     0     0      0     0     0
kcf_context_cache                     0x00100         -         0        -       40      -     -     -      -     0     -      -     -     -
zfs_btree_leaf_cache                  0x00100         -  10989568        -     4096      -     -     -      -  2683     -      -     -     -
metaslab_alloc_trace_cache            0x00100         -         0        -       72      -     -     -      -     0     -      -     -     -
brt_entry_cache                       0x00100         -         0        -       40      -     -     -      -     0     -      -     -     -
brt_pending_entry_cache               0x00100         -         0        -      160      -     -     -      -     0     -      -     -     -
ddt_cache                             0x00080    468992    234112   234496    29264      2     1     1     16     8     8      0     0     0
ddt_entry_flat_cache                  0x00100         - 269170800        -      240      -     -     -      - 1121545     -      -     -     -
ddt_entry_trad_cache                  0x00100         -         0        -      424      -     -     -      -     0     -      -     -     -
ddt_log_entry_flat_cache              0x00100         - 15814759248        -      144      -     -     -      - 109824717     -      -     -     -
ddt_log_entry_trad_cache              0x00100         -         0        -      328      -     -     -      -     0     -      -     -     -
zio_cache                             0x00100         - 11068225344        -     1216      -     -     -      - 9102159     -      -     -     -
zio_link_cache                        0x00100         - 436901520        -       48      -     -     -      - 9102115     -      -     -     -
zio_buf_comb_512                      0x200102         -      9216        -      512      -     -     -      -    18     -      -     -     -
zio_buf_comb_1024                     0x200102         -         0        -     1024      -     -     -      -     0     -      -     -     -
zio_buf_comb_1536                     0x00102         -         0        -     1536      -     -     -      -     0     -      -     -     -
zio_buf_comb_2048                     0x00102         -      2048        -     2048      -     -     -      -     1     -      -     -     -
zio_buf_comb_3072                     0x00102         -      3072        -     3072      -     -     -      -     1     -      -     -     -
zio_buf_comb_4096                     0x00102         -     28672        -     4096      -     -     -      -     7     -      -     -     -
zio_buf_comb_6144                     0x00102         -         0        -     6144      -     -     -      -     0     -      -     -     -
zio_buf_comb_8192                     0x00102         -         0        -     8192      -     -     -      -     0     -      -     -     -
zio_buf_comb_12288                    0x00102         -     12288        -    12288      -     -     -      -     1     -      -     -     -
zio_buf_comb_16384                    0x00102         -   1228800        -    16384      -     -     -      -    75     -      -     -     -
zio_buf_comb_24576                    0x00082  11206656   9240576   233472    24576     48    47    48    384   376   384      0     0     0
zio_buf_comb_32768                    0x00082 1746505728 1457258496   299008    32768   5841  5841  7976  46728 44472 63808      0     0     0
zio_buf_comb_49152                    0x00082  24944640  19759104   430080    49152     58    58    58    464   402   464      0     0     0
zio_buf_comb_65536                    0x00082  16273408  15204352   561152    65536     29    29    32    232   232   256      0     0     0
zio_buf_comb_98304                    0x00082         0         0   823296    98304      0     0     0      0     0     0      0     0     0
zio_buf_comb_131072                   0x00082 3728486400 3198025728  1085440   131072   3435  3435  4975  27480 24399 39800      0     0     0
zio_buf_comb_196608                   0x00082         0         0  1609728   196608      0     0     0      0     0     0      0     0     0
zio_buf_comb_262144                   0x00082         0         0  2134016   262144      0     0     0      0     0     0      0     0     0
zio_buf_comb_393216                   0x00082         0         0  3182592   393216      0     0     0      0     0     0      0     0     0
zio_buf_comb_524288                   0x00082         0         0  4231168   524288      0     0     0      0     0     0      0     0     0
zio_buf_comb_786432                   0x00082         0         0  6328320   786432      0     0     0      0     0     0      0     0     0
zio_buf_comb_1048576                  0x00082   8425472   2097152  8425472  1048576      1     1     1      8     2     2      0     0     0
zio_buf_comb_1572864                  0x00082         0         0 12619776  1572864      0     0     0      0     0     0      0     0     0
zio_buf_comb_2097152                  0x00082         0         0 16814080  2097152      0     0     0      0     0     0      0     0     0
zio_buf_comb_3145728                  0x00082         0         0 25202688  3145728      0     0     0      0     0     0      0     0     0
zio_buf_comb_4194304                  0x00082         0         0 29392896  4194304      0     0     0      0     0     0      0     0     0
zio_buf_comb_6291456                  0x00082         0         0 31481856  6291456      0     0     0      0     0     0      0     0     0
zio_buf_comb_8388608                  0x00082         0         0 25182208  8388608      0     0     0      0     0     0      0     0     0
zio_buf_comb_12582912                 0x00082         0         0 25178112 12582912      0     0     0      0     0     0      0     0     0
zio_buf_comb_16777216                 0x00082         0         0 16785408 16777216      0     0     0      0     0     0      0     0     0
lz4_cache                             0x200100         -         0        -    16384      -     -     -      -     0     -      -     -     -
abd_t                                 0x200100         - 170297952        -       96      -     -     -      - 1773937     -      -     -     -
sa_cache                              0x200100         -      1280        -      256      -     -     -      -     5     -      -     -     -
dnode_t                               0x200100         -   2177224        -      952      -     -     -      -  2287     -      -     -     -
arc_buf_hdr_t_full                    0x200100         - 718020720        -      240      -     -     -      - 2991753     -      -     -     -
arc_buf_hdr_t_l2only                  0x00100         -         0        -       96      -     -     -      -     0     -      -     -     -
arc_buf_t                             0x00100         -   2178048        -       32      -     -     -      - 68064     -      -     -     -
dmu_buf_impl_t                        0x00100         -  26533248        -      384      -     -     -      - 69097     -      -     -     -
zil_lwb_cache                         0x00100         -         0        -      392      -     -     -      -     0     -      -     -     -
zil_zcw_cache                         0x00100         -         0        -      152      -     -     -      -     0     -      -     -     -
sio_cache_0                           0x00100         -         0        -      136      -     -     -      -     0     -      -     -     -
sio_cache_1                           0x00100         -         0        -      152      -     -     -      -     0     -      -     -     -
sio_cache_2                           0x00100         -         0        -      168      -     -     -      -     0     -      -     -     -
zap_name                              0x00100         -      2296        -      328      -     -     -      -     7     -      -     -     -
zap_attr_cache                        0x00100         -         0        -      288      -     -     -      -     0     -      -     -     -
zap_name_long                         0x00100         -         0        -     1096      -     -     -      -     0     -      -     -     -
zap_attr_long_cache                   0x00100         -         0        -     1056      -     -     -      -     0     -      -     -     -
zfs_znode_cache                       0x200100         -      6816        -     1136      -     -     -      -     6     -      -     -     -
zfs_znode_hold_cache                  0x00100         -         0        -       88      -     -     -      -     0     -      -     -     -

slab after OOM:

--------------------- cache -------------------------------------------------------  ----- slab ------  ---- object -----  --- emergency ---
name                                    flags      size     alloc slabsize  objsize  total alloc   max  total alloc   max  dlock alloc   max
spl_zlib_workspace_cache              0x200080         0         0  2145216   268104      0     0     0      0     0     0      0     0     0
kcf_context_cache                     0x00100         -         0        -       40      -     -     -      -     0     -      -     -     -
zfs_btree_leaf_cache                  0x00100         -         0        -     4096      -     -     -      -     0     -      -     -     -
metaslab_alloc_trace_cache            0x00100         -         0        -       72      -     -     -      -     0     -      -     -     -
brt_entry_cache                       0x00100         -         0        -       40      -     -     -      -     0     -      -     -     -
brt_pending_entry_cache               0x00100         -         0        -      160      -     -     -      -     0     -      -     -     -
ddt_cache                             0x00080         0         0   234496    29264      0     0     0      0     0     0      0     0     0
ddt_entry_flat_cache                  0x00100         -         0        -      240      -     -     -      -     0     -      -     -     -
ddt_entry_trad_cache                  0x00100         -         0        -      424      -     -     -      -     0     -      -     -     -
ddt_log_entry_flat_cache              0x00100         -         0        -      144      -     -     -      -     0     -      -     -     -
ddt_log_entry_trad_cache              0x00100         -         0        -      328      -     -     -      -     0     -      -     -     -
zio_cache                             0x00100         -         0        -     1216      -     -     -      -     0     -      -     -     -
zio_link_cache                        0x00100         -         0        -       48      -     -     -      -     0     -      -     -     -
zio_buf_comb_512                      0x200102         -         0        -      512      -     -     -      -     0     -      -     -     -
zio_buf_comb_1024                     0x200102         -         0        -     1024      -     -     -      -     0     -      -     -     -
zio_buf_comb_1536                     0x00102         -         0        -     1536      -     -     -      -     0     -      -     -     -
zio_buf_comb_2048                     0x00102         -         0        -     2048      -     -     -      -     0     -      -     -     -
zio_buf_comb_3072                     0x00102         -         0        -     3072      -     -     -      -     0     -      -     -     -
zio_buf_comb_4096                     0x00102         -         0        -     4096      -     -     -      -     0     -      -     -     -
zio_buf_comb_6144                     0x00102         -         0        -     6144      -     -     -      -     0     -      -     -     -
zio_buf_comb_8192                     0x00102         -         0        -     8192      -     -     -      -     0     -      -     -     -
zio_buf_comb_12288                    0x00102         -         0        -    12288      -     -     -      -     0     -      -     -     -
zio_buf_comb_16384                    0x00102         -         0        -    16384      -     -     -      -     0     -      -     -     -
zio_buf_comb_24576                    0x00082         0         0   233472    24576      0     0     0      0     0     0      0     0     0
zio_buf_comb_32768                    0x00082         0         0   299008    32768      0     0     0      0     0     0      0     0     0
zio_buf_comb_49152                    0x00082         0         0   430080    49152      0     0     0      0     0     0      0     0     0
zio_buf_comb_65536                    0x00082         0         0   561152    65536      0     0     0      0     0     0      0     0     0
zio_buf_comb_98304                    0x00082         0         0   823296    98304      0     0     0      0     0     0      0     0     0
zio_buf_comb_131072                   0x00082   1085440   1048576  1085440   131072      1     1     1      8     8     8      0     0     0
zio_buf_comb_196608                   0x00082         0         0  1609728   196608      0     0     0      0     0     0      0     0     0
zio_buf_comb_262144                   0x00082         0         0  2134016   262144      0     0     0      0     0     0      0     0     0
zio_buf_comb_393216                   0x00082         0         0  3182592   393216      0     0     0      0     0     0      0     0     0
zio_buf_comb_524288                   0x00082         0         0  4231168   524288      0     0     0      0     0     0      0     0     0
zio_buf_comb_786432                   0x00082         0         0  6328320   786432      0     0     0      0     0     0      0     0     0
zio_buf_comb_1048576                  0x00082   8425472   2097152  8425472  1048576      1     1     1      8     2     2      0     0     0
zio_buf_comb_1572864                  0x00082         0         0 12619776  1572864      0     0     0      0     0     0      0     0     0
zio_buf_comb_2097152                  0x00082         0         0 16814080  2097152      0     0     0      0     0     0      0     0     0
zio_buf_comb_3145728                  0x00082         0         0 25202688  3145728      0     0     0      0     0     0      0     0     0
zio_buf_comb_4194304                  0x00082         0         0 29392896  4194304      0     0     0      0     0     0      0     0     0
zio_buf_comb_6291456                  0x00082         0         0 31481856  6291456      0     0     0      0     0     0      0     0     0
zio_buf_comb_8388608                  0x00082         0         0 25182208  8388608      0     0     0      0     0     0      0     0     0
zio_buf_comb_12582912                 0x00082         0         0 25178112 12582912      0     0     0      0     0     0      0     0     0
zio_buf_comb_16777216                 0x00082         0         0 16785408 16777216      0     0     0      0     0     0      0     0     0
lz4_cache                             0x200100         -         0        -    16384      -     -     -      -     0     -      -     -     -
abd_t                                 0x200100         -        96        -       96      -     -     -      -     1     -      -     -     -
sa_cache                              0x200100         -         0        -      256      -     -     -      -     0     -      -     -     -
dnode_t                               0x200100         -         0        -      952      -     -     -      -     0     -      -     -     -
arc_buf_hdr_t_full                    0x200100         -      7680        -      240      -     -     -      -    32     -      -     -     -
arc_buf_hdr_t_l2only                  0x00100         -         0        -       96      -     -     -      -     0     -      -     -     -
arc_buf_t                             0x00100         -         0        -       32      -     -     -      -     0     -      -     -     -
dmu_buf_impl_t                        0x00100         -         0        -      384      -     -     -      -     0     -      -     -     -
zil_lwb_cache                         0x00100         -         0        -      392      -     -     -      -     0     -      -     -     -
zil_zcw_cache                         0x00100         -         0        -      152      -     -     -      -     0     -      -     -     -
sio_cache_0                           0x00100         -         0        -      136      -     -     -      -     0     -      -     -     -
sio_cache_1                           0x00100         -         0        -      152      -     -     -      -     0     -      -     -     -
sio_cache_2                           0x00100         -         0        -      168      -     -     -      -     0     -      -     -     -
zap_name                              0x00100         -         0        -      328      -     -     -      -     0     -      -     -     -
zap_attr_cache                        0x00100         -         0        -      288      -     -     -      -     0     -      -     -     -
zap_name_long                         0x00100         -         0        -     1096      -     -     -      -     0     -      -     -     -
zap_attr_long_cache                   0x00100         -         0        -     1056      -     -     -      -     0     -      -     -     -
zfs_znode_cache                       0x200100         -         0        -     1136      -     -     -      -     0     -      -     -     -
zfs_znode_hold_cache                  0x00100         -         0        -       88      -     -     -      -     0     -      -     -     -
robn commented 3 weeks ago

@jtblck90 thanks for all the info. I've been able to reproduce in the lab, and I have a patch which should help. I'm still completing testing but I should be able to post a PR later today.

If you're able, could you please rerun your test with this patch? Thanks! https://github.com/robn/zfs/commit/52beaf57f3d2bcaad723a6544c6f69b7f3f4fae5

Note that this won't do anything about the Large kmem_alloc warning (though it might not be seen as often or at all). That's a separate thing and I will deal with it later. Fortunately it's only a warning and I understand where it comes from, so there's nothing to worry about on that one.

jtblck90 commented 3 weeks ago

@robn Thanks! I will test the patch and get back to you with the results.

jtblck90 commented 3 weeks ago

@robn I have tested your patch by removing 4x2TB files from a zpool with the same configuration as above. The rm command took around 10 minutes to complete and it completed successfully.

However, once the actual space reclamation started and used size on zpool started decreasing, I monitored the zpool state with watch zpool status -D command and noticed that the number of DDT entries started to increase as well as their size on disk and in core. This process slowly consumed all the RAM and the VM went OOM.

I performed the second test, but this time, I decreased the zfs_max_async_dedup_frees value to 10000 as mentioned in 16708 and this time the number of DDT entries started to decrease, slowly reducing used space without RAM usage increase.

Basically, I just needed to tune the parameter above and everything worked!

By the way, do you have any assumption when zfs 2.3.0 might be released and if your patch will be included in it?

behlendorf commented 2 weeks ago

@jtblck90 we'll pull it back in to the 2.3.0 release branch once the PR is finalized and merged to master.

jtblck90 commented 2 weeks ago

@behlendorf Thank you, that's great news! Perhaps, you have some insight on when we could expect 2.3.0 full release?