openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.42k stars 1.72k forks source link

kernel panic on ubuntu 18.04 (kernel 5.4) with zfs 2.2.2 #15790

Open arahnale opened 7 months ago

arahnale commented 7 months ago

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version 18.04
Kernel Version 5.4.0-162-generic
Architecture amd64
OpenZFS Version zfs-2.2.2 zfs-kmod-2.2.2

Describe the problem you're observing

A kernel panic occurs randomly, about one to three hours after the system starts working.

Describe how to reproduce the problem

The problem could not be reproduced on purpose.

Include any warning/errors/backtraces from the system logs

Kernel log ``` Jan 18 00:29:20 vh422 kernel: python3[18546]: segfault at 0 ip 0000000000000000 sp 00007ffd34d1afd8 error 14 in python3.6[400000+3af000] Jan 18 00:29:20 vh422 kernel: Code: Bad RIP value. Jan 18 00:29:20 vh422 kernel: nginx[64996]: segfault at 8 ip 000055d3696706ab sp 00007ffcca139e20 error 4 in nginx[55d3694c5000+f94000] Jan 18 00:29:20 vh422 kernel: Code: 89 8e 00 e9 52 ff ff ff 48 8d 15 a8 89 8e 00 e9 46 ff ff ff e8 b6 24 fc ff 66 0f 1f 44 00 00 48 83 ec 08 4c 8b 16 48 8b 46 08 <48> 8b 77 08 48 8b 3f 45 31 c9 51 52 45 31 c0 44 89 d1 48 89 c2 e8 Jan 18 00:29:20 vh422 kernel: BUG: unable to handle page fault for address: ffff92bc826df000 Jan 18 00:29:20 vh422 kernel: #PF: supervisor write access in kernel mode Jan 18 00:29:20 vh422 kernel: #PF: error_code(0x0002) - not-present page Jan 18 00:29:20 vh422 kernel: PGD 3f50801067 P4D 3f50801067 PUD 27d903a063 PMD 27c26de063 Jan 18 00:29:20 vh422 kernel: BAD Jan 18 00:29:20 vh422 kernel: Oops: 0002 [#1] SMP NOPTI Jan 18 00:29:20 vh422 kernel: CPU: 41 PID: 22347 Comm: apache2 Tainted: P OE 5.4.0-162-generic #179~18.04.1-Ubuntu Jan 18 00:29:20 vh422 kernel: Hardware name: Supermicro SYS-1029P-WTRT/X11DDW-NT, BIOS 3.8a 10/28/2022 Jan 18 00:29:20 vh422 kernel: RIP: 0010:memset_erms+0x9/0x10 Jan 18 00:29:20 vh422 kernel: Code: c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 f3 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 aa 4c 89 c8 c3 90 49 89 fa 40 0f b6 ce 48 b8 01 01 01 01 01 01 Jan 18 00:29:20 vh422 kernel: RSP: 0000:ffffb735d308fb70 EFLAGS: 00010286 Jan 18 00:29:20 vh422 kernel: RAX: 0000000000007000 RBX: 0000000000007000 RCX: fffffffffdc36000 Jan 18 00:29:20 vh422 kernel: RDX: ffffffffffff8000 RSI: 0000000000000000 RDI: ffff92bc826df000 Jan 18 00:29:20 vh422 kernel: RBP: ffffb735d308fbc8 R08: 0000000000000000 R09: ffff92bc8031d000 Jan 18 00:29:20 vh422 kernel: nginx[64997]: segfault at 0 ip 0000000000000000 sp 00007ffcca138c20 error 14 in nginx[55d3694c5000+f94000] Jan 18 00:29:20 vh422 kernel: R10: 0000000000000000 R11: 00000543e0ff3a7f R12: 0000000000001000 Jan 18 00:29:20 vh422 kernel: Code: Bad RIP value. Jan 18 00:29:20 vh422 kernel: R13: ffff92bc80316000 R14: 0000000000007000 R15: ffff92bc80316000 Jan 18 00:29:20 vh422 kernel: FS: 00007f3da97dcbc0(0000) GS:ffff92c440b40000(0000) knlGS:0000000000000000 Jan 18 00:29:20 vh422 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jan 18 00:29:20 vh422 kernel: CR2: ffff92bc826de6f8 CR3: 0000001c366c8006 CR4: 00000000007606e0 Jan 18 00:29:20 vh422 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jan 18 00:29:20 vh422 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Jan 18 00:29:20 vh422 kernel: PKRU: 55555554 Jan 18 00:29:20 vh422 kernel: Call Trace: Jan 18 00:29:20 vh422 kernel: ? show_regs+0x54/0x60 Jan 18 00:29:20 vh422 kernel: ? __die+0x87/0xd0 Jan 18 00:29:20 vh422 kernel: ? no_context+0x1b1/0x580 Jan 18 00:29:20 vh422 kernel: ? spl_kmem_cache_alloc+0xc4/0x5c0 [spl] Jan 18 00:29:20 vh422 kernel: ? __bad_area_nosemaphore+0x50/0x1f0 Jan 18 00:29:20 vh422 kernel: ? _cond_resched+0x19/0x40 Jan 18 00:29:20 vh422 kernel: ? bad_area_nosemaphore+0x16/0x20 Jan 18 00:29:20 vh422 kernel: ? do_kern_addr_fault+0x79/0x90 Jan 18 00:29:20 vh422 kernel: ? __do_page_fault+0x323/0x4d0 Jan 18 00:29:20 vh422 kernel: ? arc_access+0x280/0x410 [zfs] Jan 18 00:29:20 vh422 kernel: ? do_page_fault+0x2c/0xe0 Jan 18 00:29:20 vh422 kernel: ? page_fault+0x34/0x40 Jan 18 00:29:20 vh422 kernel: ? memset_erms+0x9/0x10 Jan 18 00:29:20 vh422 kernel: ? dmu_read_impl+0x70/0x180 [zfs] Jan 18 00:29:20 vh422 kernel: dmu_read+0x58/0x90 [zfs] Jan 18 00:29:20 vh422 kernel: zfs_fillpage+0x95/0x140 [zfs] Jan 18 00:29:20 vh422 kernel: zfs_getpage+0x5c/0x120 [zfs] Jan 18 00:29:20 vh422 kernel: nginx[64998]: segfault at 0 ip 0000000000000000 sp 00007ffcca139cf0 error 14 in nginx[55d3694c5000+f94000] Jan 18 00:29:20 vh422 kernel: zpl_readpage+0x3a/0x60 [zfs] Jan 18 00:29:20 vh422 kernel: Code: Bad RIP value. Jan 18 00:29:20 vh422 kernel: filemap_fault+0xb4e/0xb80 Jan 18 00:29:20 vh422 kernel: ? page_add_file_rmap+0x13a/0x180 Jan 18 00:29:20 vh422 kernel: ? alloc_set_pte+0x525/0x5f0 Jan 18 00:29:20 vh422 kernel: ? filemap_map_pages+0x181/0x3b0 Jan 18 00:29:20 vh422 kernel: __do_fault+0x57/0x158 Jan 18 00:29:20 vh422 kernel: __handle_mm_fault+0xdae/0x1240 Jan 18 00:29:20 vh422 kernel: handle_mm_fault+0xcb/0x210 Jan 18 00:29:20 vh422 kernel: __do_page_fault+0x2a1/0x4d0 Jan 18 00:29:20 vh422 kernel: do_page_fault+0x2c/0xe0 Jan 18 00:29:20 vh422 kernel: page_fault+0x34/0x40 Jan 18 00:29:20 vh422 kernel: RIP: 0033:0x7f3da8a734c1 Jan 18 00:29:20 vh422 kernel: Code: c8 c5 f8 77 c3 0f 1f 40 00 48 f7 d9 48 83 c1 20 48 29 ca 0f 86 c0 01 00 00 48 83 c7 20 48 81 ea 80 00 00 00 0f 86 aa 00 00 00 fd 74 0f c5 fd d7 c1 85 c0 0f 85 af 01 00 00 c5 fd 74 4f 20 c5 Jan 18 00:29:20 vh422 kernel: RSP: 002b:00007ffc6e39a148 EFLAGS: 00010206 Jan 18 00:29:20 vh422 kernel: RAX: 0000000000000000 RBX: 00007f3da70ff120 RCX: 000000000000000c Jan 18 00:29:20 vh422 kernel: RDX: 000000000000b63a RSI: 000000000000003c RDI: 00007f3c801d1000 Jan 18 00:29:20 vh422 kernel: RBP: 00007f3c801dc6ba R08: 00007f3c801d0ff4 R09: 00007f3da6dfbc4c Jan 18 00:29:20 vh422 kernel: R10: 0000000000000000 R11: 07ffffff43ffffff R12: 00007ffc6e39a258 Jan 18 00:29:20 vh422 kernel: R13: 00007f3da70ff800 R14: 00007f3c801dc6ba R15: 00007f3c801d0f2d Jan 18 00:29:20 vh422 kernel: Modules linked in: cls_fw sch_sfq sch_htb tcp_diag udp_diag inet_diag nf_log_ipv6 ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6table_raw ip6_tables xt_nat iptable_nat nf_nat xt_u32 xt_limit nf_log_ipv4 nf_log_common xt_LOG ts_kmp ipt_REJECT nf_reject_ipv4 xt_comment xt_helper xt_conntrack xt_hashlimit xt_string xt_owner xt_mark xt_multiport iptable_mangle nf_conntrack_ftp xt_tcpudp xt_CT nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_set iptable_raw iptable_filter bpfilter ipmi_si rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache zfs(POE) intel_rapl_msr intel_rapl_common spl(OE) isst_if_common skx_edac nfit x86_pkg_temp_thermal intel_powerclamp coretemp joydev input_leds kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel binfmt_misc aesni_intel crypto_simd cryptd glue_helper rapl intel_cstate ioatdma mei_me lpc_ich mei dca ipmi_ssif ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad mac_hid sch_fq_codel ip_set_hash_ip ip_set_hash_net ip_set nfnetlink Jan 18 00:29:20 vh422 kernel: 8021q garp mrp stp llc sunrpc ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear raid1 ast drm_vram_helper i2c_algo_bit ttm drm_kms_helper syscopyarea hid_generic sysfillrect usbhid i40e sysimgblt nvme hid fb_sys_fops drm nvme_core ahci libahci Jan 18 00:29:20 vh422 kernel: CR2: ffff92bc826df000 Jan 18 00:29:20 vh422 kernel: ---[ end trace 32cfb72eac38431b ]--- ```

Full log in the attachment kernel.log

rincebrain commented 7 months ago

This reminds me of #15745, could you possibly try a memtest for a bit?

arahnale commented 7 months ago

I do not think that the problem is hardware, as it began to reproduce after the upgrade from zfs 2.1.13, on all servers. Kernels 5.4 and 5.15. Uses RAM with ECC error correction. There was no problem on 2.1.13.

Options used when loading

options zfs zfs_autoimport_disable=0
options zfs zfs_nocacheflush=1
options zfs zfs_prefetch_disable=1
options zfs zfs_dmu_offset_next_sync=0
options zfs zfs_arc_max=32212254720
options zfs zfs_arc_meta_limit_percent=100
options zfs zfs_arc_dnode_limit_percent=75
rincebrain commented 7 months ago

It might not be, but could you try it while I try reproducing it locally at the moment?

arahnale commented 7 months ago

At the moment, I will try to unload the server. It was noticed that the problem was reproduced when the arc max size was increased.

echo 107374182400 > /sys/module/zfs/parameters/zfs_arc_max

I had an automation script running that increased the size of zfs_arc_max depending on the needs of the file system.

I will try to reproduce this problem.

arahnale commented 7 months ago

I removed zfs_arc_max from the settings, now it began to take 50% of the memory of the entire server. When the server ran out of memory, the kernel panicked again.

kernel.log

I'm still writing a replay test.

rincebrain commented 7 months ago

Can you tell us anything about the settings on the dataset(s) involved, or the pool configuration?

Because that crash seems to just be in trying to memset a wild pointer, which is not the most informative for why it might have one of those in the first place.

I don't think Ubuntu has prebuilt kASAN kernels, but that would probably be what i'd like to test against if you somehow have one laying around...

arahnale commented 7 months ago

I use the mirror zpool

zpool status
  pool: zpool
 state: ONLINE
config:

        NAME                                             STATE     READ WRITE CKSUM
        zpool                                            ONLINE       0     0     0
          mirror-0                                       ONLINE       0     0     0
            nvme-INTEL_SSDPE2KX080T8_BTLJ122300VB8P0HGN  ONLINE       0     0     0
            nvme-INTEL_SSDPE2KX080T8_BTLJ1223012X8P0HGN  ONLINE       0     0     0

zpool was created as a mirror with command

zpool create -o ashift=12 -o autotrim=off zpool mirror $NVME1 $NVME2 -f

Created 3 datasets

zfs set logbias=throughput zpool
zfs set sync=standard zpool
zfs set atime=off zpool
zfs set primarycache=all zpool
zfs set secondarycache=all zpool
zfs create zpool/home
zfs create zpool/mysql
zfs create zpool/reserved
zfs set recordsize=32K zpool/home
zfs set recordsize=16K zpool/mysql
zfs set reservation=100G zpool/reserved
zfs set sync=standard zpool/home
zfs set sync=standard zpool/mysql
zfs set sync=standard zpool
zfs set primarycache=metadata zpool/mysql
zfs set secondarycache=none zpool/mysql
List of all parameters `zfs get all zpool` ``` NAME PROPERTY VALUE SOURCE zpool type filesystem - zpool creation Mon Jul 3 13:54 2023 - zpool used 6.61T - zpool available 542G - zpool referenced 104K - zpool compressratio 1.01x - zpool mounted no - zpool quota none default zpool reservation none default zpool recordsize 128K default zpool mountpoint none local zpool sharenfs off default zpool checksum on default zpool compression on default zpool atime off local zpool devices on default zpool exec on default zpool setuid on default zpool readonly off default zpool zoned off default zpool snapdir hidden default zpool aclmode discard default zpool aclinherit restricted default zpool createtxg 1 - zpool canmount on default zpool xattr on default zpool copies 1 default zpool version 5 - zpool utf8only off - zpool normalization none - zpool casesensitivity sensitive - zpool vscan off default zpool nbmand off default zpool sharesmb off default zpool refquota none default zpool refreservation none default zpool guid 11658939438252704343 - zpool primarycache all local zpool secondarycache all local zpool usedbysnapshots 0B - zpool usedbydataset 104K - zpool usedbychildren 6.61T - zpool usedbyrefreservation 0B - zpool logbias throughput local zpool objsetid 54 - zpool dedup off default zpool mlslabel none default zpool sync standard local zpool dnodesize legacy default zpool refcompressratio 1.00x - zpool written 104K - zpool logicalused 6.27T - zpool logicalreferenced 46K - zpool volmode default default zpool filesystem_limit none default zpool snapshot_limit none default zpool filesystem_count none default zpool snapshot_count none default zpool snapdev hidden default zpool acltype off default zpool context none default zpool fscontext none default zpool defcontext none default zpool rootcontext none default zpool relatime on default zpool redundant_metadata all default zpool overlay on default zpool encryption off default zpool keylocation none default zpool keyformat none default zpool pbkdf2iters 0 default zpool special_small_blocks 0 default ```

I haven't been able to synthetically reproduce the core crash yet.

arahnale commented 7 months ago

In my kernel enable CONFIG_KASAN_STACK

grep CONFIG_KASAN /boot/config-5.4.0-162-generic 
# CONFIG_KASAN is not set
CONFIG_KASAN_STACK=1

Is this suitable for debugging?

rincebrain commented 7 months ago

I think those are heap crashes, so that won't help here.

arahnale commented 7 months ago

I was able to reproduce the kernel crash when using zfs 2.2.2 I wrote a script for a memory leak

#include <stdio.h> 
// malloc calloc
#include <stdlib.h>
#include <string.h>

typedef struct {
  char * str;
  unsigned long long int len;
} str_t;

void _mymalloc(str_t *str1 , str_t *str2, unsigned long long int n) {
  /** long int size_1gb = 1024 * 1024 * 1024; // размер в байтах (1 гигабайт) */
  str1->str = NULL;
  str1->len = 0;

  printf("Start _malloc %lld\n", n);

  while (str1->len < n) { 
    /** printf("%lld\n", str1->len); */
    str1->len += str2->len + 1; 
    /** printf("realloc\n"); */
    str1->str = realloc(str1->str, str1->len * sizeof(char) + str2->len * sizeof(char));
    /** printf("strcat\n"); */
    strcat(str1->str, str2->str);
  }
}

int main() {
  printf("Start\n");

  unsigned long long int size = 0;

  str_t temp;
  str_t temp1, temp2, temp3, temp4;
  temp.str = "abracadabra\0";
  temp.len = strlen(temp.str);

  _mymalloc(&temp1, &temp , 1024);
  _mymalloc(&temp2, &temp1 , 1024 * 1024 * 10);
  _mymalloc(&temp3, &temp2 , 1024 * 1024 * 1024);
  _mymalloc(&temp4, &temp3 , 1024 * 1024 * 4024);

  return 0 ;
}

and started the load on the dataset

while true; do dd if=/dev/urandom count=10M bs=1 | bzip2 -1 > /home/random.bin ; rm -f /home/random.bin ; done

I am ready to provide a server for testing if you agree to help with testing.

Dimmiuss commented 2 months ago

this problem still appears on the latest versions of ZFS: 2.1.15 and 2.2.4 all that helped us to no longer get a kernel panic was to roll back to 2.1.4 please look into the problem, the latest versions are NOT STABLE. changes that broke the ZFS code base were made between releases in 2.1.4-2.1.15. Then they moved to the 2.2 code base and these problems continue. We kindly ask you to check all commits from 2.1.4 to 2.1.15, because This bug will only make things worse in the future.