pkoutoupis / rapiddisk

An Advanced Linux RAM Drive and Caching kernel modules. Dynamically allocate RAM as block devices. Use them as stand alone drives or even map them as caching nodes to slower local disk drives. Access those volumes locally or export them across an NVMe Target network. Manage it all from a web API.
http://www.rapiddisk.org
GNU General Public License v2.0
298 stars 49 forks source link

Crash on ubuntu 22.04 #162

Closed nosammai closed 1 year ago

nosammai commented 1 year ago

Hi I just compiled latest master branch, and when I try to run fio on a ramdisk with a larger filesize I get a crash/kernel panic. I can reproduce this pretty consistently.

This is running on GCP n2-standard-32 machine running a Google ubuntu 22.04 image compiled on the same machine that is running the test.

Repro steps:

modprobe rapiddisk
modprobe rapiddisk-cache
rapiddisk -a 32768
mkfs.xfs /dev/rd0
mount /dev/rd0 /data
cd /data
fio --bs=4k --ioengine=libaio --iodepth=32 --size=10g --direct=1 --runtime=60 --filename=test --rw=write --name=fio-rapiddisk-readnwrite-test --numjobs=8 --group_reporting
[  334.657681] ------------[ cut here ]------------
[  334.657682] kernel BUG at /root/rapiddisk/module/rapiddisk.c:247!
[  334.657695] invalid opcode: 0000 [#1] SMP NOPTI
[  334.657697] CPU: 15 PID: 2139 Comm: fio Tainted: G           OE     5.15.0-1031-gcp #38-Ubuntu
[  334.657700] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023
[  334.657701] RIP: 0010:rdsk_submit_bio+0x44c/0x8aa [rapiddisk]
[  334.657706] Code: f8 48 29 fb 49 8d 0c 1c 48 c1 e9 03 f3 48 ab e9 cd fd ff ff 48 3b 70 20 0f 84 3a fd ff ff 0f 0b 4c 39 78 20 0f 84 25 fd ff ff <0f> 0b 48 8b 45 90 4c 3b 78 20 0f 85 fc 03 00 00 48 c7 c7 00 42 60
[  334.657708] RSP: 0018:ffffb892481ef7d8 EFLAGS: 00010293
[  334.657710] RAX: fffff5d2849e5180 RBX: fffff5d2c20626c0 RCX: 0000000000000000
[  334.657711] RDX: ffff9cfa23480d18 RSI: 00000000001f4571 RDI: ffff9ceac8db4fa8
[  334.657713] RBP: ffffb892481ef868 R08: 0000000000000001 R09: ffff9ceac8db4fb0
[  334.657714] R10: 000000108189e000 R11: ffff9cfa09e3adc0 R12: 0000000000001000
[  334.657715] R13: 0000000000001000 R14: ffff9ceac37d0000 R15: 00000000001f4571
[  334.657716] FS:  00007f38498f6a00(0000) GS:ffff9d09bfdc0000(0000) knlGS:0000000000000000
[  334.657718] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  334.657719] CR2: 00007f3841282000 CR3: 0000000126fd2004 CR4: 00000000003706e0
[  334.657722] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  334.657723] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  334.657724] Call Trace:
[  334.657725]  <TASK>
[  334.657726]  ? ktime_get+0x43/0xc0
[  334.657737]  __submit_bio+0x1a2/0x220
[  334.657743]  __submit_bio_noacct+0x85/0x200
[  334.657746]  submit_bio_noacct+0x4e/0x120
[  334.657748]  submit_bio+0x4a/0x130
[  334.657750]  iomap_dio_submit_bio+0x80/0x90
[  334.657755]  iomap_dio_bio_iter+0x2d8/0x4a0
[  334.657759]  __iomap_dio_rw+0x377/0x690
[  334.657762]  iomap_dio_rw+0xe/0x40
[  334.657764]  xfs_file_dio_write_aligned+0x97/0x130 [xfs]
[  334.657871]  xfs_file_write_iter+0x10c/0x1b0 [xfs]
[  334.657922]  ? security_file_permission+0x2c/0x60
[  334.657926]  aio_write+0x113/0x220
[  334.657931]  ? _copy_to_user+0x20/0x30
[  334.657933]  ? aio_read_events_ring+0x1ec/0x270
[  334.657935]  ? __fget_files+0x86/0xc0
[  334.657938]  __io_submit_one.constprop.0+0x17e/0x1f0
[  334.657940]  ? __io_submit_one.constprop.0+0x17e/0x1f0
[  334.657942]  io_submit_one+0xe3/0x3b0
[  334.657944]  __x64_sys_io_submit+0x8e/0x190
[  334.657947]  ? exit_to_user_mode_prepare+0x37/0xb0
[  334.657950]  do_syscall_64+0x59/0xc0
[  334.657954]  ? exit_to_user_mode_prepare+0x37/0xb0
[  334.657956]  ? syscall_exit_to_user_mode+0x27/0x50
[  334.657958]  ? do_syscall_64+0x69/0xc0
[  334.657960]  ? syscall_exit_to_user_mode+0x27/0x50
[  334.657962]  ? exit_to_user_mode_prepare+0x37/0xb0
[  334.657964]  ? syscall_exit_to_user_mode+0x27/0x50
[  334.657966]  ? do_syscall_64+0x69/0xc0
[  334.657967]  ? do_syscall_64+0x69/0xc0
[  334.657968]  ? do_syscall_64+0x69/0xc0
[  334.657970]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[  334.657972] RIP: 0033:0x7f384b35fa3d
[  334.657974] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c3 a3 0f 00 f7 d8 64 89 01 48
[  334.657975] RSP: 002b:00007ffe9998bb18 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1
[  33Apr 10 23:54:25 .host01 kern77el: ------------ RA cut here ]----X: -------
Apr 10ffff 23:54:25 host01 kernel: kefdrnel BUG at /roo RBt/rapiddisk/module/rapiddisk.c:2 47!
Apr 10 23:54:25 host0127f kernel: ------------[ cut here 498f4b6------------
A8pr 10 23:54:25 host01 kerne:l: ------------[00 cut here ]-----07-------
Apr 10 323:54:25 host01 kernel: ker
nel BUG at /root/rapiddisk/module/rapiddisk.c:247!
Apr 10 23:54:25 host01 kernel: invalid opcode: 0000 [#1] SMP NOPTI
Apr 10 23:54:25 host01 kernel: CPU: 15 PID: 2139 Comm: fio Tainted: G           OE     5.15.0-1031-gcp #38-Ubuntu
Apr 10 23:54:25 host01 kernel: Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023
Apr 10 23:54:25 host01 [  334.657978] RDX: 000056253069bdf8 RSI: 0000000000000001 RDI: 00007f38498cd000
[  334.657979] RBP: 00007f38498cd000 R08: 00005625306cd000 R09: 0000000000000048
[  334.657979] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
[  334.657980] R13: 0000000000000000 R14: 000056253069bdf8 R15: 000056253064a400
[  334.657982]  </TASK>
[  334.657983] Modules linked in: xfs rapiddisk_cache(OE) rapiddisk(OE) nvme_fabrics nft_limit nft_counter ipt_REJECT nf_reject_ipv4 xt_LOG nf_log_syslog xt_limit xt_multiport xt_tcpmss xt_tcpudp xt_state xt_conntrack xt_comment nft_compat nf_tables nls_iso8859_1 virtio_net net_failover failover psmouse input_leds serio_raw crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua sch_fq_codel drm efi_pstore virtio_rng ip_tables x_tables autofs4
[  334.658014] ---[ end trace 9d7adf389d3f0886 ]---
[  334.659044] invalid opcode: 0000 [#2] SMP NOPTI
[  334.663762] kernel BUG at /root/rapiddisk/module/rapiddisk.c:247!
[  334.668472] CPU: 12 PID: 2138 Comm: fio Tainted: G      D    OE     5.15.0-1031-gcp #38-Ubuntu
[  334.668474] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023
[  334.668475] RIP: 0010:rdsk_submit_bio+0x44c/0x8aa [rapiddisk]
[  334.668479] Code: f8 48 29 fb 49 8d 0c 1c 48 c1 e9 03 f3 48 ab e9 cd fd ff ff 48 3b 70 20 0f 84 3a fd ff ff 0f 0b 4c 39 78 20 0f 84 25 fd ff ff <0f> 0b 48 8b 45 90 4c 3b 78 20 0f 85 fc 03 00 00 48 c7 c7 00 42 60
[  334.668480] RSP: 0018:ffffb892481df838 EFLAGS: 00010293
[  334.751634] RIP: 0010:rdsk_submit_bio+0x44c/0x8aa [rapiddisk]
[  334.756263] RAX: fffff5d2849e3f00 RBX: fffff5d2c204e8c0 RCX: 0000000000000000
[  334.756265] RDX: ffff9cead1876320 RSI: 00000000001f4058 RDI: ffff9ceac8db4fa8
[  334.756266] RBP: ffffb892481df8c8 R08: 0000000000000001 R09: ffff9ceac8db4fb0
[  334.756267] R10: 0000000000000293 R11: 0000000000000a20 R12: 0000000000001000
[  334.756268] R13: 0000000000001000 R14: ffff9cead0108000 R15: 00000000001f4058
[  334.756270] FS:  00007f38498f6a00(0000) GS:ffff9d09bfd00000(0000) knlGS:0000000000000000
[  334.756271] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  334.763518] Code: f8 48 29 fb 49 8d 0c 1c 48 c1 e9 03 f3 48 ab e9 cd fd ff ff 48 3b 70 20 0f 84 3a fd ff ff 0f 0b 4c 39 78 20 0f 84 25 fd ff ff <0f> 0b 48 8b 45 90 4c 3b 78 20 0f 85 fc 03 00 00 48 c7 c7 00 42 60
[  334.771694] CR2: 00007f3841269a78 CR3: 00000001102e8003 CR4: 00000000003706e0
[  334.771698] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  334.771699] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  334.771700] Call Trace:
[  334.771701]  <TASK>
[  334.771702]  ? ktime_get+0x43/0xc0
[  334.771708]  __submit_bio+0x1a2/0x220
[  334.777560] RSP: 0018:ffffb892481ef7d8 EFLAGS: 00010293
[  334.784784]  __submit_bio_noacct+0x85/0x200
[  334.784787]  submit_bio_noacct+0x4e/0x120
[  334.784790]  submit_bio+0x4a/0x130
[  334.784792]  iomap_dio_submit_bio+0x80/0x90
[  334.784795]  iomap_dio_bio_iter+0x2d8/0x4a0
[  334.784799]  __iomap_dio_rw+0x377/0x690
[  334.792040] 
[  334.799262]  iomap_dio_rw+0xe/0x40
[  334.799265]  xfs_file_dio_write_aligned+0x97/0x130 [xfs]
[  334.801828] RAX: fffff5d2849e5180 RBX: fffff5d2c20626c0 RCX: 0000000000000000
[  334.804021]  xfs_file_write_iter+0x10c/0x1b0 [xfs]
[  334.807529] RDX: ffff9cfa23480d18 RSI: 00000000001f4571 RDI: ffff9ceac8db4fa8
[  334.811277]  ? security_file_permission+0x2c/0x60
[  334.811284]  aio_write+0x113/0x220
[  334.811288]  ? __check_object_size.part.0+0x4a/0x150
[  334.815581] RBP: ffffb892481ef868 R08: 0000000000000001 R09: ffff9ceac8db4fb0
[  334.819678]  ? read_events+0x96/0x1c0
[  334.819681]  ? __fget_files+0x86/0xc0
[  334.819683]  __io_submit_one.constprop.0+0x17e/0x1f0
[  334.819685]  ? __io_submit_one.constprop.0+0x17e/0x1f0
[  334.819687]  io_submit_one+0xe3/0x3b0
[  334.819690]  __x64_sys_io_submit+0x8e/0x190
[  334.823201] R10: 000000108189e000 R11: ffff9cfa09e3adc0 R12: 0000000000001000
[  334.827470]  ? syscall_exit_to_user_mode+0x27/0x50
[  334.827475]  ? do_syscall_64+0x69/0xc0
[  334.827476]  ? do_syscall_64+0x69/0xc0
[  334.827478]  ? exit_to_user_mode_prepare+0x37/0xb0
[  334.831771] R13: 0000000000001000 R14: ffff9ceac37d0000 R15: 00000000001f4571
[  334.835694]  do_syscall_64+0x59/0xc0
[  334.835696]  ? exit_to_user_mode_prepare+0x37/0xb0
[  334.835699]  ? syscall_exit_to_user_mode+0x27/0x50
[  334.835701]  ? do_syscall_64+0x69/0xc0
[  334.835702]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[  334.835704] RIP: 0033:0x7f384b35fa3d
[  334.835708] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c3 a3 0f 00 f7 d8 64 89 01 48
[  334.839216] FS:  00007f38498f6a00(0000) GS:ffff9d09bfdc0000(0000) knlGS:0000000000000000
[  334.844615] RSP: 002b:00007ffe9998bb18 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1
[  334.844617] RAX: ffffffffffffffda RBX: 00007f38498f4b68 RCX: 00007f384b35fa3d
[  334.844618] RDX: 000056253069bdc0 RSI: 0000000000000001 RDI: 00007f38498ce000
[  334.844619] RBP: 00007f38498ce000 R08: 00005625306e9000 R09: 0000000000000010
[  334.844620] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
[  334.844621] R13: 0000000000000000 R14: 000056253069bdc0 R15: 000056253064a400
[  334.844623]  </TASK>
[  334.844624] Modules linked in:
[  334.849522] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  334.854315]  xfs rapiddisk_cache(OE) rapiddisk(OE) nvme_fabrics nft_limit nft_counter ipt_REJECT nf_reject_ipv4 xt_LOG nf_log_syslog xt_limit xt_multiport xt_tcpmss xt_tcpudp xt_state xt_conntrack xt_comment nft_compat nf_tables nls_iso8859_1 virtio_net net_failover
[  334.857837] CR2: 00007f3841282000 CR3: 0000000126fd2004 CR4: 00000000003706e0
[  334.861676]  failover psmouse input_leds serio_raw crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua sch_fq_codel drm efi_pstore virtio_rng ip_tables x_tables autofs4
[  334.861710] ---[ end trace 9d7adf389d3f0887 ]---
[  334.861711] invalid opcode: 0000 [#3] SMP NOPTI
[  334.861714] CPU: 14 PID: 2140 Comm: fio Tainted: G      D    OE     5.15.0-1031-gcp #38-Ubuntu
[  334.861716] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023
[  334.861717] RIP: 0010:rdsk_submit_bio+0x44c/0x8aa [rapiddisk]
[  334.861722] Code: f8 48 29 fb 49 8d 0c 1c 48 c1 e9 03 f3 48 ab e9 cd fd ff ff 48 3b 70 20 0f 84 3a fd ff ff 0f 0b 4c 39 78 20 0f 84 25 fd ff ff <0f> 0b 48 8b 45 90 4c 3b 78 20 0f 85 fc 03 00 00 48 c7 c7 00 42 60
[  334.861723] RSP: 0018:ffffb892481f7828 EFLAGS: 00010297
[  334.861725] RAX: fffff5d2c11356c0 RBX: fffff5d2c2161cc0 RCX: 0000000000000000
[  334.861726] RDX: ffff9cfa04324fa8 RSI: 00000000001f453a RDI: ffff9ceac8db4fa8
[  334.861727] RBP: ffffb892481f78b8 R08: 0000000000000001 R09: ffff9ceac8db4fb0
[  334.861729] R10: 0000000000000293 R11: 0000000000000a20 R12: 0000000000001000
[  334.861730] R13: 0000000000001000 R14: ffff9ceac3ea9f40 R15: 00000000001f453a
[  334.861731] FS:  00007f38498f6a00(0000) GS:ffff9d09bfd80000(0000) knlGS:0000000000000000
[  334.861732] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  334.861734] CR2: 00007f38412f3040 CR3: 0000000126fe6001 CR4: 00000000003706e0
[  334.861738] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  334.861739] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  334.861740] Call Trace:
[  334.861741]  <TASK>
[  334.861741]  ? ktime_get+0x43/0xc0
[  334.861747]  __submit_bio+0x1a2/0x220
[  334.861751]  __submit_bio_noacct+0x85/0x200
[  334.861754]  submit_bio_noacct+0x4e/0x120
[  334.861756]  submit_bio+0x4a/0x130
[  334.861758]  iomap_dio_submit_bio+0x80/0x90
[  334.861761]  iomap_dio_bio_iter+0x2d8/0x4a0
[  334.861764]  __iomap_dio_rw+0x377/0x690
[  334.861768]  iomap_dio_rw+0xe/0x40
[  334.861770]  xfs_file_dio_write_aligned+0x97/0x130 [xfs]
[  334.861840]  xfs_file_write_iter+0x10c/0x1b0 [xfs]
[  334.861892]  ? security_file_permission+0x2c/0x60
[  334.861895]  aio_write+0x113/0x220
[  334.861898]  ? read_events+0x96/0x1c0
[  334.861899]  ? __fget_files+0x86/0xc0
[  334.861901]  __io_submit_one.constprop.0+0x17e/0x1f0
[  334.861903]  ? __io_submit_one.constprop.0+0x17e/0x1f0
[  334.861905]  io_submit_one+0xe3/0x3b0
[  334.861908]  __x64_sys_io_submit+0x8e/0x190
[  334.861910]  ? __x64_sys_io_submit+0xd8/0x190
[  334.861912]  do_syscall_64+0x59/0xc0
[  334.861914]  ? __x64_sys_io_getevents+0x5f/0xd0
[  334.861916]  ? do_syscall_64+0x69/0xc0
[  334.861917]  ? exit_to_user_mode_prepare+0x37/0xb0
[  334.861920]  ? syscall_exit_to_user_mode+0x27/0x50
[  334.861922]  ? do_syscall_64+0x69/0xc0
[  334.861923]  ? do_syscall_64+0x69/0xc0
[  334.861925]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[  334.861927] RIP: 0033:0x7f384b35fa3d
[  334.861929] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c3 a3 0f 00 f7 d8 64 89 01 48
[  334.861930] RSP: 002b:00007ffe9998bb18 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1
[  334.861932] RAX: ffffffffffffffda RBX: 00007f38498f4b68 RCX: 00007f384b35fa3d
[  334.861933] RDX: 000056253069be08 RSI: 0000000000000001 RDI: 00007f38498cc000
[  334.861934] RBP: 00007f38498cc000 R08: 00005625306c5000 R09: 0000000000000058
[  334.861934] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
[  334.861935] R13: 0000000000000000 R14: 000056253069be08 R15: 000056253064a400
[  334.861937]  </TASK>
[  334.861937] Modules linked in: xfs rapiddisk_cache(OE) rapiddisk(OE) nvme_fabrics nft_limit nft_counter ipt_REJECT nf_reject_ipv4 xt_LOG nf_log_syslog xt_limit xt_multiport xt_tcpmss xt_tcpudp xt_state xt_conntrack xt_comment nft_compat nf_tables nls_iso8859_1 virtio_net net_failover failover psmouse input_leds serio_raw crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua sch_fq_codel drm efi_pstore virtio_rng ip_tables x_tables autofs4
[  334.861978] ---[ end trace 9d7adf389d3f0888 ]---
[  334.866324] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  334.866326] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  334.958762] RIP: 0010:rdsk_submit_bio+0x44c/0x8aa [rapiddisk]
[  334.964706] Kernel panic - not syncing: Fatal exception
[  334.972394] Code: f8 48 29 fb 49 8d 0c 1c 48 c1 e9 03 f3 48 ab e9 cd fd ff ff 48 3b 70 20 0f 84 3a fd ff ff 0f 0b 4c 39 78 20 0f 84 25 fd ff ff <0f> 0b 48 8b 45 90 4c 3b 78 20 0f 85 fc 03 00 00 48 c7 c7 00 42 60
[  335.062206] Kernel Offset: 0xba00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[  336.124990] Rebooting in 10 seconds..]
pkoutoupis commented 1 year ago

This is an interesting one because I am unable to reproduce this kernel panic on a stock Ubuntu Server 22.04 running a [most current] 5.15.0-69 kernel. There must be something unique going on with the 5.15.0-1031-gcp kernel in the Google image. I will need to find access to a Google Cloud environment and rerun.

nosammai commented 1 year ago

I swapped out xfs for ext4 and could couldn't repro the crash so it seems like it has something to do with the interaction of the gcp kernel/xfs/rapiddisk. Let me know if there's anything else that you'd like me to test out!

pkoutoupis commented 1 year ago

OK. Some initial thoughts but no fix yet.

  1. What I find most interesting is this line of the stack dump: "invalid opcode: 0000 [#3] SMP NOPTI". This is telling us that the process is accessing an "invalid section of memory". Why? I am not sure because this is only occurring with XFS, so I am not sure if the bug is in XFS, RapidDisk or both or elsewhere (despite the rdsk_submit_bio function being pointed to by the stack pointer).
  2. The Google kernel is loading without PTI (Spectre/Meltdown) support and I wonder if this is having some influence. If you recall from 5 or 6 years ago, the PTI code addresses security and limits access from outside processes and parties to sensitive memory regions. I wonder what would happen if you modify your grub config file(s) to boot with a kernel arg of "pti" instead of "nopti". If you remove "nopti", I wonder if it will default to "pti."
  3. I also read online how some were able to address similar issues by disabling CPU C-states. I don't understand why that is, yet. I know that this is a virtual instance but I wonder if the hypervisor is having some CPU related issues. What happens if you disable kernel max_cstate in the grub boot args?
pkoutoupis commented 1 year ago

What version of RapidDisk are you using? I would like to try something out with a patch. Does the behavior change when you manually apply the following changes?

@@ -270,10 +274,7 @@ static struct page *rdsk_insert_page(struct rdsk_device *rdsk, sector_t sector)
         * If XIP was reworked to use pfns and kmap throughout, this
         * restriction might be able to be lifted.
         */
-       gfp_flags = GFP_NOIO | __GFP_ZERO;
-#ifndef CONFIG_BLK_DEV_XIP
-       gfp_flags |= __GFP_HIGHMEM;
-#endif
+       gfp_flags = GFP_NOIO | __GFP_ZERO | __GFP_HIGHMEM;
        page = alloc_page(gfp_flags);
        if (!page)
                return NULL;
@@ -285,13 +286,12 @@ static struct page *rdsk_insert_page(struct rdsk_device *rdsk, sector_t sector)

        spin_lock(&rdsk->rdsk_lock);
        idx = sector >> PAGE_SECTORS_SHIFT;
+       page->index = idx;
        if (radix_tree_insert(&rdsk->rdsk_pages, idx, page)) {
                __free_page(page);
                page = radix_tree_lookup(&rdsk->rdsk_pages, idx);
                BUG_ON(!page);
                BUG_ON(page->index != idx);
-       } else {
-               page->index = idx;
        }
        spin_unlock(&rdsk->rdsk_lock);

EDIT - If you are uncomfortable applying the changes, you can try to use the kernel module from this branch: https://github.com/pkoutoupis/rapiddisk/tree/feature/bcc-scripts

EDIT 2 - Just make sure you do a git pull in the repo before or after you checkout the branch: git checkout feature/bcc-scripts.

EDIT 3 - The code in question is now merged into master and you can either clone the master branch or download the 9.1.0 tag: https://github.com/pkoutoupis/rapiddisk/releases/tag/9.1.0

nosammai commented 1 year ago

Tested with latest master branch and confirmed the crash no longer occurs. Thank you for the quick fix!

pkoutoupis commented 1 year ago

Nice! I will close this issue then. If you stumble on anything, just open up a new issue. Thanks!