Closed Flole998 closed 1 year ago
thanks for report, I don't know if @SergeiShtepa have already received a report of similar error or can found the cause also without decode the stacktrace the better should include decoded stacktrace if possible, for do it from a fast search I found this that seems explained good and with also part of ubuntu debug kernel: https://gist.github.com/doughgle/735229c34c52f9006ca92a2cf24da990
Thanks. I will think. "better should include decoded stacktrace if possible" - yes.
The warning message is generated here. But I still don't understand what the problem is. Logs of the blksnap module (/var/log/veeam/) would be helpful.
Are you able to access files attached to a veeam support case if I give you the case ID?
If you have a case ID, I'll get it soon. :-)
I have a case ID for a different issue (snapshot overflow, that also appeared after the upgrade, was fine before) and it's for the free version, so it might even get closed when nobody has time to look at it.
The case ID is 05879385. I just checked and noticed that there were warnings before this one aswell, maybe that explains what happened:
[ 820.954528] UBSAN: shift-out-of-bounds in /var/lib/dkms/blksnap/6.0.0.1060/build/cbt_map.h:109:11
[ 820.954567] shift exponent 32 is too large for 32-bit type 'int'
[ 820.954589] CPU: 23 PID: 37350 Comm: snapshot operat Tainted: P W OE K 5.15.0-60-generic #66-Ubuntu
[ 820.954591] Hardware name: Dell Inc. PowerEdge R720/0XH7F2, BIOS 2.9.0 12/06/2019
[ 820.954593] Call Trace:
[ 820.954596] <TASK>
[ 820.954599] show_stack+0x52/0x5c
[ 820.954607] dump_stack_lvl+0x4a/0x63
[ 820.954611] dump_stack+0x10/0x16
[ 820.954612] ubsan_epilogue+0x9/0x49
[ 820.954626] __ubsan_handle_shift_out_of_bounds.cold+0x61/0xef
[ 820.954629] ? blkdev_get_by_dev.part.0+0xb5/0x320
[ 820.954636] tracker_collect.cold+0x18/0x21 [blksnap]
[ 820.954643] ? ioctl_setlog+0x180/0x180 [blksnap]
[ 820.954647] ioctl_tracker_collect+0x98/0x280 [blksnap]
[ 820.954650] ? ioctl_setlog+0x180/0x180 [blksnap]
[ 820.954654] ctrl_unlocked_ioctl+0x78/0xd0 [blksnap]
[ 820.954658] __x64_sys_ioctl+0x95/0xd0
[ 820.954664] do_syscall_64+0x5c/0xc0
[ 820.954670] ? do_syscall_64+0x69/0xc0
[ 820.954671] entry_SYSCALL_64_after_hwframe+0x61/0xcb
[ 820.954675] RIP: 0033:0x7f495e6f2aff
[ 820.954678] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00
[ 820.954680] RSP: 002b:00007f485f7fd0e0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 820.954683] RAX: ffffffffffffffda RBX: 000000000354cbb0 RCX: 00007f495e6f2aff
[ 820.954684] RDX: 00007f485f7fd260 RSI: 0000000040105602 RDI: 0000000000000037
[ 820.954686] RBP: 000000000354cb80 R08: 00007f488010cdc0 R09: 0000000000000000
[ 820.954687] R10: 0000000000000000 R11: 0000000000000246 R12: 000000000354cbd8
[ 820.954688] R13: 00007f485f7fd180 R14: 000000000354cba8 R15: 00007f485f7fd17f
[ 820.954690] </TASK>
[ 820.954699] ================================================================================
[ 821.287745] blksnap-diff-storage: Cannot get empty storage block
[ 821.287800] blksnap-diff-area: Set snapshot device is corrupted for [253:1] with error code 28
[ 821.287857] blksnap-tracker: Failed to copy data to diff storage with error 14
[ 821.291588] blk_update_request: critical space allocation error, dev blksnap-image1, sector 13464903664 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[ 821.291649] ------------[ cut here ]------------
[ 821.291650] refcount_t: underflow; use-after-free.
[ 821.291660] WARNING: CPU: 26 PID: 37398 at lib/refcount.c:28 refcount_warn_saturate+0xf7/0x150
[ 821.291667] Modules linked in: blksnap(OE) bdevfilter(OEK) nvidia_uvm(POE) nvidia(POE) rpcsec_gss_krb5 mpt3sas raid_class scsi_transport_sas mptctl mptbase dell_rbu nft_chain_nat xt_REDIRECT xt_MASQUERADE xt_owner xt_nat nf_nat nft_counter xt_LOG nf_log_syslog xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_tcpudp nft_compat nf_tables nfnetlink binfmt_misc intel_rapl_msr rc_tt_1500 snd_hda_codec_hdmi ts2020 intel_rapl_common snd_hda_intel m88ds3103 sb_edac snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec i2c_mux x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_core kvm_intel dvb_usb_dw2102 dvb_usb dvb_core snd_hwdep snd_pcm joydev mc input_leds snd_timer kvm snd soundcore ipmi_ssif rapl dcdbas intel_cstate mei_me mei mac_hid acpi_power_meter ipmi_si sch_fq_codel ipmi_watchdog ipmi_devintf ipmi_msghandler 8021q garp mrp stp llc nfsd parport_pc ppdev auth_rpcgss nfs_acl lockd lp grace parport sunrpc ramoops efi_pstore reed_solomon pstore_blk pstore_zone
[ 821.291724] ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress dm_crypt raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear bonding tls mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec crct10dif_pclmul hid_generic crc32_pclmul ghash_clmulni_intel rc_core usbhid cdc_ether aesni_intel usbnet igb crypto_simd mii ahci hid dca cryptd drm lpc_ich libahci megaraid_sas i2c_algo_bit wmi
[ 821.291757] CPU: 26 PID: 37398 Comm: probe-bcache Tainted: P W OE K 5.15.0-60-generic #66-Ubuntu
[ 821.291759] Hardware name: Dell Inc. PowerEdge R720/0XH7F2, BIOS 2.9.0 12/06/2019
[ 821.291761] RIP: 0010:refcount_warn_saturate+0xf7/0x150
[ 821.291764] Code: eb 9e 0f b6 1d b9 40 ba 01 80 fb 01 0f 87 cd 88 6e 00 83 e3 01 75 89 48 c7 c7 b0 53 23 b7 c6 05 9d 40 ba 01 01 e8 b7 0c 6b 00 <0f> 0b e9 6f ff ff ff 0f b6 1d 88 40 ba 01 80 fb 01 0f 87 8a 88 6e
[ 821.291766] RSP: 0018:ffffbc372138f700 EFLAGS: 00010286
[ 821.291768] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000027
[ 821.291769] RDX: ffff98086fb60588 RSI: 0000000000000001 RDI: ffff98086fb60580
[ 821.291771] RBP: ffffbc372138f708 R08: 0000000000000003 R09: fffffffffff9adb0
[ 821.291772] R10: 0000000000000028 R11: 0000000000000001 R12: ffff980237f70180
[ 821.291773] R13: ffffdc2afe956c00 R14: 0000000000000000 R15: ffffbc372138f810
[ 821.291775] FS: 00007fad22c4d780(0000) GS:ffff98086fb40000(0000) knlGS:0000000000000000
[ 821.291776] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 821.291778] CR2: 00007ffe9b397738 CR3: 00000005e2a52003 CR4: 00000000000606e0
[ 821.291779] Call Trace:
[ 821.291781] <TASK>
[ 821.291784] blk_mq_free_request+0x14f/0x160
[ 821.291789] blk_mq_end_request+0x12c/0x140
[ 821.291791] blk_mq_dispatch_rq_list+0x4c7/0x680
[ 821.291794] ? sbitmap_queue_resize+0x51/0x60
[ 821.291799] __blk_mq_do_dispatch_sched+0xba/0x2e0
[ 821.291802] blk_mq_do_dispatch_sched+0x40/0x70
[ 821.291804] __blk_mq_sched_dispatch_requests+0x105/0x150
[ 821.291806] blk_mq_sched_dispatch_requests+0x35/0x70
[ 821.291808] __blk_mq_run_hw_queue+0x34/0xc0
[ 821.291811] __blk_mq_delay_run_hw_queue+0x16a/0x170
[ 821.291813] blk_mq_run_hw_queue+0x87/0x130
[ 821.291816] blk_mq_sched_insert_requests+0x69/0xf0
[ 821.291818] blk_mq_flush_plug_list+0x103/0x1c0
[ 821.291821] blk_flush_plug_list+0xdd/0x110
[ 821.291825] blk_finish_plug+0x2d/0x50
[ 821.291828] read_pages+0x11d/0x280
[ 821.291831] ? add_to_page_cache_lru+0x78/0xd0
[ 821.291834] page_cache_ra_unbounded+0x15d/0x210
[ 821.291836] force_page_cache_ra+0xe6/0x150
[ 821.291838] page_cache_sync_ra+0x40/0xe0
[ 821.291839] filemap_get_pages+0xda/0x3f0
[ 821.291842] filemap_read+0xbc/0x3e0
[ 821.291845] ? blkdev_read_iter+0x4a/0x60
[ 821.291846] ? rseq_get_rseq_cs.isra.0+0x1b/0x230
[ 821.291848] ? new_sync_read+0x10d/0x190
[ 821.291851] ? rseq_ip_fixup+0x72/0x170
[ 821.291852] generic_file_read_iter+0xe5/0x150
[ 821.291855] blkdev_read_iter+0x4a/0x60
[ 821.291856] new_sync_read+0x10d/0x190
[ 821.291858] vfs_read+0x103/0x1a0
[ 821.291860] ksys_read+0x67/0xf0
[ 821.291862] __x64_sys_read+0x19/0x20
[ 821.291864] do_syscall_64+0x5c/0xc0
[ 821.291868] entry_SYSCALL_64_after_hwframe+0x61/0xcb
[ 821.291871] RIP: 0033:0x7fad22d64992
[ 821.291875] Code: c0 e9 b2 fe ff ff 50 48 8d 3d fa b2 0c 00 e8 c5 1d 02 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
[ 821.291876] RSP: 002b:00007ffe9b39a9c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 821.291878] RAX: ffffffffffffffda RBX: 0000557a9f361448 RCX: 00007fad22d64992
[ 821.291879] RDX: 0000000000000100 RSI: 0000557a9f361458 RDI: 0000000000000003
[ 821.291880] RBP: 0000557a9f3612a0 R08: 0000000000000000 R09: 0000557a9f361430
[ 821.291881] R10: 0000557a9f3623f0 R11: 0000000000000246 R12: 00000645243fe000
[ 821.291883] R13: 0000000000000100 R14: 0000557a9f361430 R15: 0000557a9f3612f0
[ 821.291885] </TASK>
[ 821.291885] ---[ end trace c1947abfeca4e049 ]---
@SergeiShtepa I decoded them with same blksnap version and kernel package in ubuntu 22.04 (if you didn't already do): https://paste.debian.net/hidden/bfb42a5f/ https://paste.debian.net/hidden/0db0a518/ I'll probably create an howto to add to readme in this repository trying to make simple do it for users, @Flole998 I can use one of your stacktrace as example for the howto?
@Fantu It doesn't seem to contain any personal data, right? Only the machine type is basically "unique", everything else is publicly available anyways, right? If that's the case go ahead and use it.
Hi! I think the fix should help. So, just get latest from branch VAL-6.0. A problem was found in the algorithm for determining the block size for change-tracking. The problem appears for large disks (several terabytes). We are looking forward to your feedback.
@Flole998 if you want try it and you didn't found how to build the package and install without broke the veeam package dep. or make conflict (when try to install the alternative veeamsnap) you can workaround forcing the version to 6.0.0.1060 use https://github.com/veeam/blksnap/tree/VAL-6.0 and build package with:
cd ./pkg/deb/blksnap-dkms
build.sh 6.0.0.1060
and for install manually generate package (in build directory)
sudo dpkg -i blksnap*.deb
I hope this can be useful and explained good
Hi @Flole998 ! As far as I know from the support service, most of the problems were solved. Are there any warning messages left in the kernel log?
It seems that the problem has been solved, since there is no feedback on this issue.
Distribution
Ubuntu 22.04
Architecture
amd64
Kernel version
5.15.0-60-generic from ubuntu
Blksnap version
6.0.0.1060
Bug description
There was a warning logged after a snapshow overflow:
Steps to reproduce
No response
Expected behavior
No response
Additional informations
No response