openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.62k stars 1.75k forks source link

(reproducable)After the pool is imported, it will "Out of memory:killed process" and freeze the operating system #16322

Open wangxinmian opened 4 months ago

wangxinmian commented 4 months ago

System information

Type Version/Name
Distribution Name pve
Distribution Version pve-manager/8.2.4
Kernel Version Linux pve-gen8 6.8.8-2-pve #1 SMP PREEMPT_DYNAMIC PMX 6.8.8-2 (2024-06-24T09:00Z) x86_64 GNU/Linux
Architecture amd64
OpenZFS Version zfs-2.2.4-pve1 zfs-kmod-2.2.4-pve1

Describe the problem you're observing

Immediately after zpool import pool0, insufficient memory will be triggered, and many processes will be killed. Then the system completely stops responding.

After restarting the operating system, this problem is triggered again after automatic import of the pool or manual import of the pool.

The computer is an hp gen8 microserver with 16GB of memory, and since the device only supports a maximum of 16GB of memory, I couldn't increase the memory. An attempt to add a 100 GB swap partition file to root pool rpool did not work for this out-of-memory problem.

I tried to start the ubuntu 24.04 iso desktop operating system to mount this pool, also failed, the graphical interface is stuck and no response, I can not see the error message.

I have a screen recording of the crash in pve OS, I slowed it down 0.25x, not sure if this one has anything useful in it. I checked journalctl, which logs when there is no suspected failure, maybe the system crashed without retaining the information.

Sorry, I'm not a linux professional and don't know what other information would help with this error. If you need any action, please let me know so that I can reproduce the error again and collect the information.

Describe how to reproduce the problem

pool0 is a pool of four 12 TB disks with a capacity of 20TB. Since there's only 300gb of free space in the pool, I'm trying to free up space. Manually deleting a 1Tb file triggered the problem.

Include any warning/errors/backtraces from the system logs

Screen recording when ssh executes import operation: https://github.com/openzfs/zfs/assets/174770717/eba99562-5c0c-4cf5-9deb-ba045992a4df

Some screenshots captured from the screen recording at the time of failure (for easy observation) :

1 2 3 4 5 6

journalctl.log:

journalctl.log

wangxinmian commented 4 months ago

I forgot to mention that executing 'zpool import -Ff pool0' will not solve the problem, it will trigger the same problem.

wangxinmian commented 4 months ago
root@pve1:~# zpool import
   pool: boot-pool
     id: 1769529030260875389
  state: ONLINE
status: The pool was last accessed by another system.
 action: The pool can be imported using its name or numeric identifier and
        the '-f' flag.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-EY
 config:

        boot-pool   ONLINE
          sdf3      ONLINE

   pool: pool0
     id: 6915880835950409966
  state: ONLINE
status: The pool was last accessed by another system.
 action: The pool can be imported using its name or numeric identifier and
        the '-f' flag.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-EY
 config:

        pool0                                     ONLINE
          raidz2-0                                ONLINE
            25574ef3-364a-11eb-b721-b05ada87f844  ONLINE
            255e23b7-364a-11eb-b721-b05ada87f844  ONLINE
            1ca562f7-3b86-11eb-b721-b05ada87f844  ONLINE
            64ad5feb-4125-11eb-b928-b05ada87f844  ONLINE
AllKind commented 4 months ago

Did you try to limit the memory allocated by ZFS? https://openzfs.github.io/openzfs-docs/Performance%20and%20Tuning/Module%20Parameters.html#zfs-arc-max

Also maybe this module parameter could be useful: https://openzfs.github.io/openzfs-docs/Performance%20and%20Tuning/Module%20Parameters.html#zfs-arc-sys-free

wangxinmian commented 4 months ago

@AllKind Thank you very much, I was unaware of these zfs parameter Settings.

I tried the following:

// Here is the original default value
root@pve1:~# cat /sys/module/zfs/parameters/zfs_arc_max 
1671430144
root@pve1:~# cat /sys/module/zfs/parameters/zfs_arc_sys_free
0
//  0.5GB *1024*1024*1024 = 536870912
root@pve1:~# echo 536870912 > /sys/module/zfs/parameters/zfs_arc_max
root@pve1:~# cat /sys/module/zfs/parameters/zfs_arc_max 
536870912
root@pve1:~# cat /sys/module/zfs/parameters/zfs_scan_strict_mem_lim
0
root@pve1:~# echo 1 > /sys/module/zfs/parameters/zfs_scan_strict_mem_lim
root@pve1:~# cat /sys/module/zfs/parameters/zfs_scan_strict_mem_lim
1
//  8GB *1024*1024*1024 = 8589934592
root@pve1:~# echo 8589934592 > /sys/module/zfs/parameters/zfs_arc_sys_free
root@pve1:~# cat /sys/module/zfs/parameters/zfs_arc_sys_free
8589934592
root@pve1:~# zpool import -Ff pool0
root@pve1:~# Connection to 192.168.1.2 closed by remote host.

The system still crashes after the pool is imported.

https://github.com/openzfs/zfs/assets/174770717/9611cd54-ebba-492f-8351-716995634dbf

AllKind commented 4 months ago

I don't think zfs_scan_strict_mem_lim has any context here.

Generally I'd apply the parameters at module load time (/etc/modprobe.d/ or similar - distro dependent).

So you are allowing the ARC to be 0.5 GB of max size and then you tell ZFS to give 8GB to other applications if needed. But, just guessing that ZFS can handle that logic... Did you check arcstats after applying zfs_arc_max, that it actually resized to the requested value?

wangxinmian commented 4 months ago

Hello, thank you very much for your help.

I set up /etc/modprobe.d/zfs.conf and restarted the computer.

'arcstat' as shown below, it seems possible that zfs can use 8GB of memory? I'm not sure if the Settings are in effect.

Try importing pool0 again, still crashing.

Can I ask you a question?

Currently, when I reboot, the system automatically mounts pool0 and crashes. I currently start the ubuntu 2404 iso system, manually mount pool0, and start pve again after ubuntu crashes. Since pool0 had an ubuntu mount unreleased record, pve did not automatically mount pool0, so the system did not crash immediately and I had time to modify the zfs Settings.

In addition to this method, is there any other way to make pve boot does not automatically mount pool0?


root@pve1:~# cat /etc/modprobe.d/zfs.conf
# Setting up ZFS ARC size on Ubuntu as per our needs
# Set Max ARC size => 2GB == 2147483648 Bytes
options zfs zfs_arc_max=2147483648

# Set Min ARC size => 1GB == 1073741824
options zfs zfs_arc_min=1073741824

# 8GB
options zfs zfs_arc_sys_free=8589934592
root@pve1:~# cat /sys/module/zfs/parameters/zfs_arc_max
2147483648
root@pve1:~# cat /sys/module/zfs/parameters/zfs_arc_sys_free
8589934592
root@pve1:~# arcstat 
    time  read  ddread  ddh%  dmread  dmh%  pread  ph%   size      c  avail
21:50:13     0       0     0       0     0      0    0   188M  1024M   6.2G
root@pve1:~# arcstat -h
Usage: arcstat [-havxp] [-f fields] [-o file] [-s string] [interval [count]]

         -h : Print this help message
         -a : Print all possible stats
         -v : List all possible field headers and definitions
         -x : Print extended stats
         -z : Print zfetch stats
         -f : Specify specific fields to print (see -v)
         -o : Redirect output to the specified file
         -s : Override default field separator with custom character or string
         -p : Disable auto-scaling of numerical fields

Examples:
        arcstat -o /tmp/a.log 2 10
        arcstat -s "," -o /tmp/a.log 2 10
        arcstat -v
        arcstat -f time,hit%,dh%,ph%,mh% 1

root@pve1:~# arcstat -a
    time  hits  iohs  miss  read  hit%  ioh%  miss%  dhit  dioh  dmis  dh%  di%  dm%  ddhit  ddioh  ddmis  ddh%  ddi%  ddm%  dmhit  dmioh  dmmis  dmh%  dmi%  dmm%  phit  pioh  pmis  ph%  pi%  pm%  pdhit  pdioh  pdmis  pdh%  pdi%  pdm%  pmhit  pmioh  pmmis  pmh%  pmi%  pmm%  mhit  mioh  mmis  mread  mh%  mi%  mm%  arcsz   size      c   mfu   mru  mfug  mrug   unc  eskip  el2skip  el2cach  el2el  el2mfu  el2mru  el2inel  mtxmis  dread  ddread  dmread  pread  pdread  pmread  grow   need   free  avail  waste  ztotal  zhits  zahead  zpast  zmisses  zmax  zfuture  zstride  zissued  zactive
21:51:47     0     0     0     0     0     0      0     0     0     0    0    0    0      0      0      0     0     0     0      0      0      0     0     0     0     0     0     0    0    0    0      0      0      0     0     0     0      0      0      0     0     0     0     0     0     0      0    0    0    0   186M   186M  1024M     0     0     0     0     0      0        0        0      0       0       0        0       0      0       0       0      0       0       0     1      0    14G   6.2G   706K       0      0       0      0        0     0        0        0        0        0
root@pve1:~# free -h
               total        used        free      shared  buff/cache   available
Mem:            15Gi       1.5Gi        14Gi        42Mi       141Mi        14Gi
Swap:             0B          0B          0B
wangxinmian commented 4 months ago

The pve system is primarily a storage requirement with only one internal lxc container that provides smb/sync. Since the container depends on pool0, the lxc container is not started when the actual failure occurs.

Sorry, because I don't know much about zfs, I'm not sure if the parameters I just set are correct.

wangxinmian commented 4 months ago

I don't think zfs_scan_strict_mem_lim has any context here.

@AllKind

I have removed this parameter. I am not familiar with the zfs parameter, but I just read the documentation and thought it might be useful before adding it.以上翻译结果来自有道神经网络翻译(YNMT)· 通用场景

AllKind commented 4 months ago

it seems possible that zfs can use 8GB of memory?

That would be the default on linux to use half the available system memory.

As you also use ZFS on root, I'm not sure how the behavior is with /etc/modprobe.d. You could try to set it on the kernel command line. https://docs.kernel.org/admin-guide/kernel-parameters.html setting zfs.zfs_arc_max=2147483648 in the boot menu by pressing 'e', or edit /etc/default/grub and run sudo update-initramfs -u to make it permanent.

is there any other way to make pve boot does not automatically mount pool0?

I don't know.

eternalscripter commented 4 months ago

The same problem appeared 5 hours ago on the same version of PVE and kernel. Memory limits - useless. The problem occurs when importing a large 44TB pool

eternalscripter commented 4 months ago

try this and show please:

root@erpband-pve1:~# cat /sys/module/zfs/parameters/zfs_arc_min
5368709119
root@erpband-pve1:~# cat /sys/module/zfs/parameters/zfs_arc_max
5368709120

arc_summary | grep -E 'ARC size \(current\)|Target size \(adaptive\)|Min size \(hard limit\)|Max size \(high water\)|Anonymous metadata size'

**arc_summary | grep -E 'ARC size \(current\)|Target size \(adaptive\)|Min size \(hard limit\)|Max size \(high water\)|Anonymous metadata size'**    
ARC size (current):                                  1043.4 %   52.2 GiB
        Target size (adaptive):                       100.0 %    5.0 GiB
        Min size (hard limit):                        100.0 %    5.0 GiB
        Max size (high water):                            1:1    5.0 GiB
        Anonymous metadata size:                       99.4 %   50.1 GiB

as you can see Anonymous metadata size: 99.4% 50.1 GiB - this is incorrect. on our other pve it does not exceed 400kb our video (timecode ram util): https://youtu.be/4eLd8Po541o?si=Ya6akZPmmuEIxSWZ&t=382

wangxinmian commented 4 months ago

Try this again,zfs fixed at 6GB, zfs_arc_sys_free is 5GB, still the whole system will be blocked.

image

wangxinmian commented 4 months ago

hello, I tried to modify the grub Settings and boot with F10.The same glitch still occurs.

My guess is that root pool has nothing to do with it, as ubuntu 2404 also suffers from the same crash problem.

image

root@pve1:~# cat /sys/module/zfs/parameters/zfs_arc_max
2147483648
root@pve1:~# arcstat 
    time  read  ddread  ddh%  dmread  dmh%  pread  ph%   size      c  avail
11:04:11     0       0     0       0     0      0    0   187M  1024M   6.1G
root@pve1:~# arcstat -a
    time  hits  iohs  miss  read  hit%  ioh%  miss%  dhit  dioh  dmis  dh%  di%  dm%  ddhit  ddioh  ddmis  ddh%  ddi%  ddm%  dmhit  dmioh  dmmis  dmh%  dmi%  dmm%  phit  pioh  pmis  ph%  pi%  pm%  pdhit  pdioh  pdmis  pdh%  pdi%  pdm%  pmhit  pmioh  pmmis  pmh%  pmi%  pmm%  mhit  mioh  mmis  mread  mh%  mi%  mm%  arcsz   size      c   mfu   mru  mfug  mrug   unc  eskip  el2skip  el2cach  el2el  el2mfu  el2mru  el2inel  mtxmis  dread  ddread  dmread  pread  pdread  pmread  grow   need   free  avail  waste  ztotal  zhits  zahead  zpast  zmisses  zmax  zfuture  zstride  zissued  zactive
11:04:13     0     0     0     0     0     0      0     0     0     0    0    0    0      0      0      0     0     0     0      0      0      0     0     0     0     0     0     0    0    0    0      0      0      0     0     0     0      0      0      0     0     0     0     0     0     0      0    0    0    0   186M   186M  1024M     0     0     0     0     0      0        0        0      0       0       0        0       0      0       0       0      0       0       0     1      0    14G   6.1G   705K       0      0       0      0        0     0        0        0        0        0
root@pve1:~# zpool import pool0
cannot import 'pool0': pool was previously in use from another system.
Last accessed by ubuntu (hostid=38138694) at Sat Jul  6 10:15:23 2024
The pool can be imported, use 'zpool import -f' to import the pool.
root@pve1:~# zpool import -f pool0
root@pve1:~# Connection to 192.168.1.3 closed by remote host.
rincebrain commented 4 months ago

The recording above suggests that it's claiming the slabs for zio_buf_comb_4096 is 9 GiB, which is, I think, basically all your RAM.

My wild blind guess would be that it's issuing a lot of IOs and for some reason the old buffers from them aren't being reaped from the cache before you OOM. Maybe the kernel in newer versions changed some logic about how that process works, and that's why it's not biting people on older versions?

AllKind commented 4 months ago

@wangxinmian do you have an older kernel (preferably older than 6.6) available in your distro to test?

AllKind commented 4 months ago

@wangxinmian Another thing maybe worth trying is to disable Multi-Gen LRU: https://docs.kernel.org/admin-guide/mm/multigen_lru.html echo n >/sys/kernel/mm/lru_gen/enabled

wangxinmian commented 4 months ago

Using ubuntu 23.10.1 iso system test, the system is still stuck.The graphical interface does not display an error message, but is completely unresponsive. I'll wait and try the command line interface to see if there are any error outputs.

image

image

image

disable Multi-Gen LRU:

root@pve1:~# echo n >/sys/kernel/mm/lru_gen/enabled
root@pve1:~# echo /sys/kernel/mm/lru_gen/enabled
/sys/kernel/mm/lru_gen/enabled
root@pve1:~# cat /sys/kernel/mm/lru_gen/enabled
0x0000
root@pve1:~# arcstat 
    time  read  ddread  ddh%  dmread  dmh%  pread  ph%   size      c  avail
19:04:22     0       0     0       0     0      0    0   269M  1024M   6.0G
root@pve1:~# arcstat -a
    time  hits  iohs  miss  read  hit%  ioh%  miss%  dhit  dioh  dmis  dh%  di%  dm%  ddhit  ddioh  ddmis  ddh%  ddi%  ddm%  dmhit  dmioh  dmmis  dmh%  dmi%  dmm%  phit  pioh  pmis  ph%  pi%  pm%  pdhit  pdioh  pdmis  pdh%  pdi%  pdm%  pmhit  pmioh  pmmis  pmh%  pmi%  pmm%  mhit  mioh  mmis  mread  mh%  mi%  mm%  arcsz   size      c   mfu   mru  mfug  mrug   unc  eskip  el2skip  el2cach  el2el  el2mfu  el2mru  el2inel  mtxmis  dread  ddread  dmread  pread  pdread  pmread  grow   need   free  avail  waste  ztotal  zhits  zahead  zpast  zmisses  zmax  zfuture  zstride  zissued  zactive
19:04:27     0     0     0     0     0     0      0     0     0     0    0    0    0      0      0      0     0     0     0      0      0      0     0     0     0     0     0     0    0    0    0      0      0      0     0     0     0      0      0      0     0     0     0     0     0     0      0    0    0    0   269M   269M  1024M     0     0     0     0     0      0        0        0      0       0       0        0       0      0       0       0      0       0       0     1      0    13G   6.0G   746K       0      0       0      0        0     0        0        0        0        0
root@pve1:~# zpool import -f pool0
root@pve1:~# Connection to 192.168.1.2 closed by remote host.
AllKind commented 4 months ago

As it seems it's not that easy to limit the memory usage of the pool import, I'd suggest to try it with swap and oom settings.

If you do not have a dedicated partition (disk) for swap, you can use a file as swapfile on zfs. Not ideal, but it should work. 16GB should be enough as size.

This guide I think is a good starting point: https://rakeshjain-devops.medium.com/linux-out-of-memory-killer-31e477a45759

wangxinmian commented 4 months ago

Thank you very much for your help.

I am very sorry that I have network problems and cannot access the connection you provided.

I tried to add a swap partition and import pool0.

The screen is not showing any error messages such as out of memory killing processes, but both ssh and the native console are not responding.

I'll wait a day or so to see if the import completes successfully.

root@pve1~# swapon /dev/zvol/rpool/swap
root@pve1:~# free-h
-bash: free-h: command not found
root@pve1:~# free -h
total        used        free      shared  buff/cache   available
Mem: 15Gi 1.7Gi 14Gi 48Mi 149Mi 13Gi
Swap:           99Gi          0B        99Gi
root@pve1:~# zpool import -f pool0
robn commented 4 months ago

There's a couple of directions I'd like to tackle this from. I need some info to get started.

Could you capture cat /proc/spl/kmem/slab before the import, and at the time of the OOM, and post them? If it takes a while before it blows up, it'd be good to sample this every few seconds as well, but you said "immediately" so maybe not. Afterwards is most interesting though; I want to start by seeing if the problem is that there's nothing reclaimable, or nothing being reclaimed.

Has this problem only started since using 2.2.4? What the was the last "good" version that it did work on? What about kernel versions? When you say you tried Ubuntu, can you please confirm that that was with the kernel and ZFS that comes with it, that is, not the PVE builds? (I expect so, since you say it was an ISO; I just want to be sure).

wangxinmian commented 4 months ago

Thank you very much for your help.非常感谢您的帮助。

I am very sorry that I have network problems and cannot access the connection you provided.很抱歉,我遇到了网络问题,无法访问您提供的连接。

I tried to add a swap partition and import pool0.我尝试添加交换分区并导入 pool0。

The screen is not showing any error messages such as out of memory killing processes, but both ssh and the native console are not responding.屏幕未显示任何错误消息,例如内存不足终止进程,但 ssh 和本机控制台均未响应。

I'll wait a day or so to see if the import completes successfully.我将等待一天左右,看看导入是否成功完成。

root@pve1~# swapon /dev/zvol/rpool/swap
root@pve1:~# free-h
-bash: free-h: command not found
root@pve1:~# free -h
total        used        free      shared  buff/cache   available
Mem: 15Gi 1.7Gi 14Gi 48Mi 149Mi 13Gi
Swap:           99Gi          0B        99Gi
root@pve1:~# zpool import -f pool0

It has been one day and the system is still not responding.

wangxinmian commented 4 months ago

Thank you very much. I am not familiar with linux and zfs, please let me know if you need any more information.

Below is output at 0.5 second intervals (/proc/sp/kmem/slab and /proc/meminfo), containing records from before import until out of memory crash.

zinfo.log

I used to use freenas core and freenas scale and didn't have this problem. But I'm not sure if it's just because there's no trigger that I don't encounter this problem.

There was a question earlier that I wasn't sure would make a difference: This pool was switched from freenas scale to pve system. When switching encountered the problem of DMAR: ERROR: DMA PTE for vPFN 0x8e8fe already set, see that the failure may be the cause of damage of file system. However, after adding the kernel parameter 'iommu.passtrough=1' as instructed to return to normal, I performed the 'zpool scrub pool0' operation, which completed successfully with no errors detected.

So far I have tested ubuntu-24.04-desktop-amd64.iso (should be zfs 2.2.4), ubuntu-23.10.1-desktop-amd64.iso(ZFs-2.1.0-RC3-0Ubuntu4) and ubuntu-22.04.4-desktop-amd64.iso(zfs-2.1.5-1ubuntu6-22.04.2 zFs-kmod-2.0-0Ubuntu1-23.10) has a crash problem after importing the pool. However, since I am using the desktop version, there is no error message, just the graphical interface is stuck

Also, I am not sure if it is due to pve deleting large files that pool0 has been damaged, and no version of zfs can be imported into the pool after that.

robn commented 4 months ago

Thanks for that info. These are the time points where things went from "fine" to "very bad", and the specific memory caches that blew up:

--------------------- cache -------------------------------------------------------  ----- slab ------  ---- object -----  --- emergency ---
name                                    flags      size     alloc slabsize  objsize  total alloc   max  total alloc   max  dlock alloc   max

time: 2024-07-08 19:37:47.256
zio_cache                             0x00100         -    184320        -     1280      -     -     -      -   144     -      -     -     -
zio_link_cache                        0x00100         -       336        -       48      -     -     -      -     7     -      -     -     -
zio_buf_comb_512                      0x00102         -    203776        -      512      -     -     -      -   398     -      -     -     -
zio_buf_comb_4096                     0x00102         -     81920        -     4096      -     -     -      -    20     -      -     -     -
abd_t                                 0x00100         -   1164176        -      104      -     -     -      - 11194     -      -     -     -

time: 2024-07-08 19:37:47.756
zio_cache                             0x00100         -    184320        -     1280      -     -     -      -   144     -      -     -     -
zio_link_cache                        0x00100         -       336        -       48      -     -     -      -     7     -      -     -     -
zio_buf_comb_512                      0x00102         -    203776        -      512      -     -     -      -   398     -      -     -     -
zio_buf_comb_4096                     0x00102         -     90112        -     4096      -     -     -      -    22     -      -     -     -
abd_t                                 0x00100         -    898768        -      104      -     -     -      -  8642     -      -     -     -

time: 2024-07-08 19:37:48.256
zio_cache                             0x00100         - 1036832000        -     1280      -     -     -      - 810025     -      -     -     -
zio_link_cache                        0x00100         -  38875344        -       48      -     -     -      - 809903     -      -     -     -
zio_buf_comb_512                      0x00102         -  90944512        -      512      -     -     -      - 177626     -      -     -     -
zio_buf_comb_4096                     0x00102         - 2174652416        -     4096      -     -     -      - 530921     -      -     -     -
abd_t                                 0x00100         -  74552608        -      104      -     -     -      - 716852     -      -     -     -

time: 2024-07-08 19:37:48.757
zio_cache                             0x00100         - 2625370880        -     1280      -     -     -      - 2051071     -      -     -     -
zio_link_cache                        0x00100         -  98445792        -       48      -     -     -      - 2050954     -      -     -     -
zio_buf_comb_512                      0x00102         - 228610048        -      512      -     -     -      - 446504     -      -     -     -
zio_buf_comb_4096                     0x00102         - 5459980288        -     4096      -     -     -      - 1333003     -      -     -     -
abd_t                                 0x00100         - 185772704        -      104      -     -     -      - 1786276     -      -     -     -

It's pretty clear that this is a ton of 512B and 4K IO being issued at high speed, and it doesn't stop. Obviously that shouldn't happen.

ubuntu-22.04.4-desktop-amd64.iso(zfs-2.1.5-1ubuntu6-22.04.2 zFs-kmod-2.0-0Ubuntu1-23.10) has a crash problem after importing the pool

If you've got output from this crash, it would be very useful!

If this was the only report, I would guess that there's some damage in the pool that is causing something to run in a tight loop, blasting out IO until all memory is consumed. From the sizes, I'd guess raidz IO.

But! #16325 reports basically the same issue, on the same kernel and OpenZFS versions. That suggests something subtle has changed in an interaction between OpenZFS and the kernel, and OpenZFS is responding incorrectly.

I can't help any more right now; time for bed here. If no one else is able to help out overnight, I'll have a bit more of a think about it tomorrow.

rincebrain commented 4 months ago

Conceivably, you could try setting spl_kmem_cache_slab_limit=0 on the SPL module - the SPL tries to use Linux's own caches for small objects (since Linux has Opinions on how you shouldn't be doing larger allocations in the kernel, so they have a very low ceiling for how large those can be) as an optimization, and if it's only 512b/4k allocations that are doing something wild here, that would be an obvious special case to try turning off.

wangxinmian commented 4 months ago

Conceivably, you could try setting spl_kmem_cache_slab_limit=0 on the SPL module - the SPL tries to use Linux's own caches for small objects (since Linux has Opinions on how you shouldn't be doing larger allocations in the kernel, so they have a very low ceiling for how large those can be) as an optimization, and if it's only 512b/4k allocations that are doing something wild here, that would be an obvious special case to try turning off.

It seems that setting options spl spl_kmem_cache_slab_limit=0 will not start, I used ubuntu.iso to try to remove it.

root@pve1:~# cat /etc/modprobe.d/zfs.conf
# Setting up ZFS ARC size on Ubuntu as per our needs
# Set Max ARC size => 2GB == 2147483648 Bytes
options zfs zfs_arc_max=2147483648

# Set Min ARC size => 1GB == 1073741824
options zfs zfs_arc_min=1073741824

# 8GB 
options zfs zfs_arc_sys_free=8589934592

# https://github.com/openzfs/zfs/issues/16322#issuecomment-2214223207
options spl spl_kmem_cache_slab_limit=0
root@pve1:~# update-initramfs -u
root@pve1:~# boot

image

wangxinmian commented 4 months ago

If you've got output from this crash, it would be very useful!

I tried to see if I could get the crash, which I suspect was a little difficult.

rincebrain commented 4 months ago

I may have hit that, I forget, but I didn't think that would break that horrendously.

You could try setting it below 512, to like 511, but not 0, so it still uses Linux's caches for very small things, maybe it's doing something foolish like trying to build 4k caches for its own metadata for caches out of itself.

wangxinmian commented 4 months ago

You could try setting it below 512, to like 511, but not 0, so it still uses Linux's caches for very small things, maybe it's doing something foolish like trying to build 4k caches for its own metadata for caches out of itself.

Thank you for your help. 511 still fails to boot. I'll try to set this value after the boot.

image

wangxinmian commented 4 months ago

Set this value to 511 after the operating system is started. If zpool import pool0 is executed, the memory still crashes.


root@pve1:~# echo 511 > /sys/module/spl/parameters/spl_kmem_cache_slab_limit
root@pve1:~# cat /sys/module/spl/parameters/spl_kmem_cache_slab_limit
511
root@pve1:~# zpool import -f pool0
root@pve1:~# Connection to 192.168.1.2 closed by remote host.
Dmitrius7 commented 4 months ago

Hi!

I have a similar problem, I created a separate theme. My pool is mounted read-only successfully.

Important note! If I import the pool as read-only, everything works fine, and the RAM is not consumed. zpool import -o readonly=on -d /dev/disk/by-partlabel/ zp-erp1-hdd

It's a pity that no one answers me in my topic (( The ARC size (current) is ten times larger than the Max size (zfs_arc_max), it will result in "Out of memory: killed process". #16325

rincebrain commented 4 months ago

I posted in your thread a link to this one, suggesting I thought it was a duplicate, and that also causes it to make a link in this thread to that one, if you scroll up. Usually what happens when that is suggested is people keep debugging in the one that's more active until enough evidence exists to test whether that's a duplicate or not, often by fixing the problem and seeing if the other people's problems went away.

If you'd like support in a more timely fashion than random volunteers are providing, I believe various companies out there will sell you support on demand, though I have no idea what their rates are.

wangxinmian commented 3 months ago

**Important note! If I import the pool as read-only, everything works fine, and the RAM is not consumed.

Thank you very much.

I can also read only mount here, but the amount of data is too large, it is not easy to rebuild the pool after backup. In addition, I don't know if there is a problem with my pve system or what the reason is, even if the read-only mount, the pve system will sometimes stall. I suspect that the pve system may have been corrupted by multiple restarts, so I'll switch it to ubuntu to see if I can capture the crash dump. I tried under pve, but did not succeed in getting the crash dump.

Dmitrius7 commented 3 months ago

Did you try import pool not from pve? From ubuntu or different OS? Probably at different OS it imports fine?

wangxinmian commented 3 months ago

Did you try import pool not from pve? From ubuntu or different OS? Probably at different OS it imports fine?

Thank you very much for your reply

So far, I have tested ubuntu-24.04-desktop-amd64.iso (should be ZFs 2.2.4) ubuntu-23.10.1-desktop-amd64.iso(ZFs-2.1.0-RC3-0Ubuntu4) and ubuntu-22.04.4-desktop-amd64.iso(zfs-2.1.5-1ubuntu6-22.04.2 zFs-kmod-2.0-0Ubuntu1-23.10), will be faulty. However, since I was using the desktop version, there was no error message, just a stuck graphical interface.

I'm not sure if these versions have bugs, or if pve has broken zfs and cannot import under ubuntu.

Dmitrius7 commented 2 months ago

I installed FreeBSD 14 and imported this pool. It works fine!