Closed ir1ka closed 2 weeks ago
Addition info: Target: Debian 12.7 LIO enable iser
Based on current testing, it looks like when the target system boots up, the crash appears when connect for the first time, but not when you connect again after the core. This still needs more testing.
Print the *iscsi
, and its error_string
is SENSE KEY:ILLEGAL_REQUEST(5) ASCQ:INVALID_OPERATION_CODE(0x2000)
.
(gdb) f 8
#8 0x00007efce3ca6171 in iscsi_free (ptr=<optimized out>, iscsi=0x5615c0594ad0) at init.c:108
108 free(ptr);
(gdb) p *iscsi
$2 = {drv = 0x7efce3ccb380 <iscsi_transport_iser>, opaque = 0x5615c0596450, transport = ISER_TRANSPORT,
initiator_name = "iqn.1993-08.org.debian:01:xxxxxxxxx", '\000' <repeats 217 times>, target_name = "iqn.1993-08.org.debian:01:yyyyyyyyy", '\000' <repeats 210 times>,
target_address = '\000' <repeats 255 times>, connected_portal = "192.168.1.2:3260", '\000' <repeats 238 times>, portal = "192.168.1.2:3260", '\000' <repeats 238 times>,
alias = '\000' <repeats 255 times>, bind_interfaces = '\000' <repeats 255 times>, user = '\000' <repeats 255 times>, passwd = '\000' <repeats 255 times>,
chap_c = '\000' <repeats 2048 times>, target_user = '\000' <repeats 255 times>, target_passwd = '\000' <repeats 255 times>, target_chap_i = 0, target_chap_r = '\000' <repeats 15 times>,
error_string = "SENSE KEY:ILLEGAL_REQUEST(5) ASCQ:INVALID_OPERATION_CODE(0x2000)", '\000' <repeats 191 times>, session_type = ISCSI_SESSION_NORMAL, isid = "\200\217\272\257\000",
rdma_ack_timeout = 0 '\000', itt = 745273325, cmdsn = 631945849, min_cmdsn_waiting = 631944922, expcmdsn = 631945849, maxcmdsn = 631945900, statsn = 0,
want_header_digest = ISCSI_HEADER_DIGEST_NONE, header_digest = ISCSI_HEADER_DIGEST_NONE, want_data_digest = ISCSI_DATA_DIGEST_NONE, data_digest = ISCSI_DATA_DIGEST_NONE, fd = 12,
is_connected = 1, is_corked = 0, tcp_user_timeout = 0, tcp_keepcnt = 3, tcp_keepintvl = 30, tcp_keepidle = 30, tcp_syncnt = 0, tcp_nonblocking = 0, current_phase = 12, next_phase = 3,
secneg_phase = 0, login_attempts = 1, is_loggedin = 1, bind_interfaces_cnt = 0, nops_in_flight = 0, chap_a = 0, chap_i = 0, socket_status_cb = 0x0, connect_data = 0x5615c0594550,
outqueue = 0x0, outqueue_current = 0x0, waitpdu = 0x5615c14cbd50, incoming = 0x0, max_burst_length = 262144, first_burst_length = 65536, initiator_max_recv_data_segment_length = 8192,
target_max_recv_data_segment_length = 8192, want_initial_r2t = ISCSI_INITIAL_R2T_NO, use_initial_r2t = ISCSI_INITIAL_R2T_YES, want_immediate_data = ISCSI_IMMEDIATE_DATA_YES,
use_immediate_data = ISCSI_IMMEDIATE_DATA_YES, lun = 0, no_auto_reconnect = 0, reconnect_deferred = 0, reconnect_max_retries = -1, pending_reconnect = 0, log_level = 0, log_fn = 0x0,
mallocs = 18023, reallocs = 15, frees = 17800, smallocs = 44022, smalloc_ptrs = {0x5615c06018b0, 0x5615c06017a0, 0x5615c0a909b0, 0x5615c0a90620, 0x5615c05fdfc0, 0x5615c0635c80,
0x5615c0ab0af0, 0x5615c0ab09e0, 0x5615c0620eb0, 0x5615c05de690, 0x5615c0a908a0, 0x5615c0a90790, 0x5615c0fef670, 0x5615c0aace00, 0x5615c1769950, 0x5615c1769840, 0x5615c05faf00,
0x5615c0ac5a30, 0x5615c0ab8fd0, 0x5615c0ab8ec0, 0x5615c05d40e0, 0x5615c05d3fd0, 0x5615c05fa470, 0x5615c05fa360, 0x5615c05686a0, 0x5615c0568590, 0x5615c1870670, 0x5615c0ab92c0,
0x5615c05643c0, 0x5615c05642b0, 0x5615c0564710, 0x5615c05fb890, 0x5615c057d430, 0x5615c057d320, 0x5615c0a9c0f0, 0x5615c0ac4700, 0x5615c2138120, 0x5615c2138010, 0x5615c0ee5670,
0x5615c0ee5560, 0x5615c2146f40, 0x5615c2146e30, 0x5615c1769ce0, 0x5615c1769bd0, 0x5615c05fdce0, 0x5615c05fdbd0, 0x5615c2139510, 0x5615c2139400, 0x5615c05f9fa0, 0x5615c05f9e90,
0x5615c05fade0, 0x5615c05facd0, 0x5615c05d4800, 0x5615c05d46f0, 0x5615c057dcc0, 0x5615c057dbb0, 0x5615c0d83e30, 0x5615c0ae4c10, 0x5615c21461c0, 0x5615c21460b0, 0x5615c05650e0,
0x5615c0564fd0, 0x5615c0ee08d0, 0x5615c0a90290, 0x5615c1870430, 0x5615c1870320, 0x5615c0aac290, 0x5615c063c800, 0x5615c0ab2820, 0x5615c0ab2710, 0x5615c05d4580, 0x5615c05d4470,
0x5615c05d41f0, 0x5615c2145490, 0x5615c061c0d0, 0x5615c0616130, 0x5615c0a95a70, 0x5615c063c5c0, 0x5615c0a8ee90, 0x5615c0617d50, 0x5615c14cc1f0, 0x5615c14cc0e0, 0x5615c02832d0,
0x5615c1573b00, 0x5615c061f310, 0x5615c061f200, 0x5615c0ab4d90, 0x5615c0ab4c80, 0x5615c0a914b0, 0x5615c0a913a0, 0x5615c05fbeb0, 0x5615c0abe690, 0x5615c161aaf0, 0x5615c161a9e0,
0x5615c2137d90, 0x5615c2137c80, 0x5615c0ac2c20, 0x5615c0a90510, 0x5615c0ac68a0, 0x5615c0ac6790, 0x5615c0ac6510, 0x5615c0ac6400, 0x5615c0ac2fb0, 0x5615c0ac2ea0, 0x5615c05fd0b0,
0x5615c0a950e0, 0x5615c0638e80, 0x5615c0638d70, 0x5615c0600db0, 0x5615c0600ca0, 0x5615c0abb080, 0x5615c0abaf70, 0x5615c05fd710, 0x5615c05fd600, 0x5615c0ab4780, 0x5615c061d170,
0x5615c0ac77d0, 0x5615c0ac76c0, 0x5615c0a90e50, 0x5615c0a90d40, 0x5615c05e2190, 0x5615c05e2080, 0x5615c0ab2490, 0x5615c0ab2380, 0x5615c0a91120, 0x5615c0a91010, 0x5615c05fd380,
0x5615c05fd270}, smalloc_free = 128, smalloc_size = 256, cache_allocations = 1, next_reconnect = 0, scsi_timeout = 0, old_iscsi = 0x0, retry_cnt = 0, no_ua_on_reconnect = 0,
fd_dup_cb = 0x0, fd_dup_opaque = 0x0}
There are some strange points, the code iser.c:565
points to iscsi_sfree(iscsi, pdu->outdata.data);
,and
(gdb) bt
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
#1 0x00007efce438de9f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2 0x00007efce433efb2 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3 0x00007efce4329472 in __GI_abort () at ./stdlib/abort.c:79
#4 0x00007efce4382430 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7efce449c459 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#5 0x00007efce43977ba in malloc_printerr (str=str@entry=0x7efce449f0c8 "free(): invalid next size (fast)") at ./malloc/malloc.c:5660
#6 0x00007efce439962a in _int_free (av=0x7efce44d5c60 <main_arena>, p=0x5615c060d640, have_lock=have_lock@entry=0) at ./malloc/malloc.c:4518
#7 0x00007efce439be9f in __GI___libc_free (mem=<optimized out>) at ./malloc/malloc.c:3385
#8 0x00007efce3ca6171 in iscsi_free (ptr=<optimized out>, iscsi=0x5615c0594ad0) at init.c:108
#9 iscsi_free (iscsi=0x5615c0594ad0, ptr=<optimized out>) at init.c:106
#10 0x00007efce3cbaea5 in iscsi_iser_free_pdu (iscsi=0x5615c0594ad0, pdu=0x5615c02b1ae0) at iser.c:565
#11 0x00007efce3caab74 in iscsi_process_pdu (iscsi=iscsi@entry=0x5615c0594ad0, in=in@entry=0x7fff89d3c8d0) at pdu.c:739
#12 0x00007efce3cbb13d in iser_rcv_completion (iser_conn=<optimized out>, rx_desc=<optimized out>) at iser.c:1383
#13 iser_handle_wc (iser_conn=<optimized out>, wc=<optimized out>) at iser.c:1417
#14 cq_event_handler (iser_conn=iser_conn@entry=0x5615c0596450) at iser.c:1459
#15 0x00007efce3cbcb3f in cq_handle (iser_conn=0x5615c0596450) at iser.c:1507
#16 iscsi_iser_service (iscsi=0x5615c0594ad0, revents=<optimized out>) at iser.c:118
#17 0x00007efce3ccfaaf in ?? () from /usr/bin/../lib/x86_64-linux-gnu/qemu/block-iscsi.so
#18 0x00005615b1cb8778 in aio_dispatch_handler (ctx=ctx@entry=0x5615c0331ac0, node=0x5615c05cf810) at ../../util/aio-posix.c:369
#19 0x00005615b1cb9012 in aio_dispatch_handlers (ctx=0x5615c0331ac0) at ../../util/aio-posix.c:412
#20 aio_dispatch (ctx=0x5615c0331ac0) at ../../util/aio-posix.c:422
#21 0x00005615b1cccdde in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at ../../util/async.c:352
#22 0x00007efce48dd7a9 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#23 0x00005615b1ccf3d8 in glib_pollfds_poll () at ../../util/main-loop.c:297
#24 os_host_main_loop_wait (timeout=0) at ../../util/main-loop.c:320
#25 main_loop_wait (nonblocking=nonblocking@entry=0) at ../../util/main-loop.c:606
#26 0x00005615b192b957 in qemu_main_loop () at ../../softmmu/runstate.c:739
#27 0x00005615b1b4f956 in qemu_default_main () at ../../softmmu/main.c:37
#28 0x00007efce432a24a in __libc_start_call_main (main=main@entry=0x5615b175e570 <main>, argc=argc@entry=112, argv=argv@entry=0x7fff89d3cfc8) at ../sysdeps/nptl/libc_start_call_main.h:58
#29 0x00007efce432a305 in __libc_start_main_impl (main=0x5615b175e570 <main>, argc=112, argv=0x7fff89d3cfc8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>,
stack_end=0x7fff89d3cfb8) at ../csu/libc-start.c:360
#30 0x00005615b175fe51 in _start ()
(gdb) f 10
#10 0x00007efce3cbaea5 in iscsi_iser_free_pdu (iscsi=0x5615c0594ad0, pdu=0x5615c02b1ae0) at iser.c:565
565 iscsi_sfree(iscsi, pdu->outdata.data);
(gdb) list
560 iser_tx_desc_free(iscsi, iser_pdu->desc);
561 iser_pdu->desc = NULL;
562 }
563
564 if (pdu->outdata.size <= iscsi->smalloc_size) {
565 iscsi_sfree(iscsi, pdu->outdata.data);
566 } else {
567 iscsi_free(iscsi, pdu->outdata.data);
568 }
569 pdu->outdata.data = NULL;
(gdb) p pdu->outdata.size
$7 = 48
(gdb) p iscsi->smalloc_size
$8 = 256
(gdb) p pdu->outdata.size <= iscsi->smalloc_size
$9 = 1
It satisfies the conditional statement, so it should call iscsi_sfree
, but the stack trace shows that it calls iscsi_free
.
I have checked the code of iscsi_sfree, and it will call iscsi_free if iscsi->smalloc_free == SMALL_ALLOC_MAX_FREE
. And macro SMALL_ALLOC_MAX_FREE
define as (128)
(gdb) p iscsi->cache_allocations
$11 = 1
(gdb) p iscsi->smalloc_free
$12 = 128
Looks like memory corruption. This is tricky and reproducible at random.
Can it trigger under valgrind?
I don't have access to any iSER systems so :-(
Qemu, running under valgrind, seemed to run very slowly, and the guest system could hardly boot.
I will try to use -fsanitize=address to compile libiscsi and qemu to check memory overflow issue.
The following is the log captured using -fsanitize=address
.
==110292==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
=================================================================
==110292==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60200032249a at pc 0x7faf98a47681 bp 0x7fad65173060 sp 0x7fad65172810
WRITE of size 48 at 0x60200032249a thread T2
#0 0x7faf98a47680 in __interceptor_memset ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:799
#1 0x7faf96570fb0 in iscsi_szmalloc /work/libiscsi/lib/init.c:133
#2 0x7faf9657f30b in iscsi_allocate_pdu /work/libiscsi/lib/pdu.c:204
#3 0x7faf9658747e in iscsi_scsi_command_async /work/libiscsi/lib/iscsi-command.c:206
#4 0x55a0ef5ef485 in iscsi_aio_ioctl /home/builder/debian/qemu-7.2+dfsg/bin/debug/native/../../../block/iscsi.c:1099
#5 0x55a0ef2b93be in bdrv_co_ioctl /home/builder/debian/qemu-7.2+dfsg/bin/debug/native/../../../block/io.c:3210
#6 0x55a0ef42936f in raw_co_ioctl /home/builder/debian/qemu-7.2+dfsg/bin/debug/native/../../../block/raw-format.c:423
#7 0x55a0ef2b9289 in bdrv_co_ioctl /home/builder/debian/qemu-7.2+dfsg/bin/debug/native/../../../block/io.c:3208
#8 0x55a0ef250249 in blk_co_do_ioctl /home/builder/debian/qemu-7.2+dfsg/bin/debug/native/../../../block/block-backend.c:1655
#9 0x55a0ef2504d6 in blk_aio_ioctl_entry /home/builder/debian/qemu-7.2+dfsg/bin/debug/native/../../../block/block-backend.c:1676
#10 0x55a0ef9cfee1 in coroutine_trampoline /home/builder/debian/qemu-7.2+dfsg/bin/debug/native/../../../util/coroutine-ucontext.c:177
#11 0x7faf950709bf (/lib/x86_64-linux-gnu/libc.so.6+0x519bf)
0x60200032249a is located 0 bytes to the right of 10-byte region [0x602000322490,0x60200032249a)
allocated by thread T2 here:
#0 0x7faf98ab89cf in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:69
#1 0x7faf96570b5f in iscsi_malloc /work/libiscsi/lib/init.c:84
#2 0x7faf965c14a4 in iser_prepare_read_cmd /work/libiscsi/lib/iser.c:773
#3 0x7faf965c205d in iser_send_command /work/libiscsi/lib/iser.c:890
#4 0x7faf965c2584 in iscsi_iser_send_pdu /work/libiscsi/lib/iser.c:940
#5 0x7faf965c2c3b in iscsi_iser_queue_pdu /work/libiscsi/lib/iser.c:1001
#6 0x7faf965853d7 in iscsi_queue_pdu /work/libiscsi/lib/pdu.c:943
#7 0x7faf96587b20 in iscsi_scsi_command_async /work/libiscsi/lib/iscsi-command.c:283
#8 0x55a0ef5ef485 in iscsi_aio_ioctl /home/builder/debian/qemu-7.2+dfsg/bin/debug/native/../../../block/iscsi.c:1099
#9 0x55a0ef2b93be in bdrv_co_ioctl /home/builder/debian/qemu-7.2+dfsg/bin/debug/native/../../../block/io.c:3210
#10 0x55a0ef42936f in raw_co_ioctl /home/builder/debian/qemu-7.2+dfsg/bin/debug/native/../../../block/raw-format.c:423
#11 0x55a0ef2b9289 in bdrv_co_ioctl /home/builder/debian/qemu-7.2+dfsg/bin/debug/native/../../../block/io.c:3208
#12 0x55a0ef250249 in blk_co_do_ioctl /home/builder/debian/qemu-7.2+dfsg/bin/debug/native/../../../block/block-backend.c:1655
#13 0x55a0ef2504d6 in blk_aio_ioctl_entry /home/builder/debian/qemu-7.2+dfsg/bin/debug/native/../../../block/block-backend.c:1676
#14 0x55a0ef9cfee1 in coroutine_trampoline /home/builder/debian/qemu-7.2+dfsg/bin/debug/native/../../../util/coroutine-ucontext.c:177
#15 0x7faf950709bf (/lib/x86_64-linux-gnu/libc.so.6+0x519bf)
Thread T2 created by T0 here:
#0 0x7faf98a49726 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:207
#1 0x55a0ef94820b in qemu_thread_create /home/builder/debian/qemu-7.2+dfsg/bin/debug/native/../../../util/qemu-thread-posix.c:545
#2 0x55a0ef13a2cb in iothread_init /home/builder/debian/qemu-7.2+dfsg/bin/debug/native/../../../iothread.c:215
#3 0x55a0ef032eac in event_loop_base_complete /home/builder/debian/qemu-7.2+dfsg/bin/debug/native/../../../event-loop-base.c:86
#4 0x55a0ef02f2e0 in user_creatable_complete /home/builder/debian/qemu-7.2+dfsg/bin/debug/native/../../../qom/object_interfaces.c:28
#5 0x55a0ef02fd48 in user_creatable_add_type /home/builder/debian/qemu-7.2+dfsg/bin/debug/native/../../../qom/object_interfaces.c:125
#6 0x55a0ef030314 in user_creatable_add_qapi /home/builder/debian/qemu-7.2+dfsg/bin/debug/native/../../../qom/object_interfaces.c:157
#7 0x55a0edd64ecb in object_option_foreach_add /home/builder/debian/qemu-7.2+dfsg/bin/debug/native/../../../softmmu/vl.c:1718
#8 0x55a0edd60be2 in qemu_create_early_backends /home/builder/debian/qemu-7.2+dfsg/bin/debug/native/../../../softmmu/vl.c:1896
#9 0x55a0edd564c5 in qemu_init /home/builder/debian/qemu-7.2+dfsg/bin/debug/native/../../../softmmu/vl.c:3547
#10 0x55a0eefe2d91 in main /home/builder/debian/qemu-7.2+dfsg/bin/debug/native/../../../softmmu/main.c:47
#11 0x7faf95046249 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
SUMMARY: AddressSanitizer: heap-buffer-overflow ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:799 in __interceptor_memset
Shadow bytes around the buggy address:
0x0c048005c440: fa fa fd fd fa fa fd fa fa fa fd fa fa fa fd fa
0x0c048005c450: fa fa fd fa fa fa fd fa fa fa fd fa fa fa fd fa
0x0c048005c460: fa fa fd fd fa fa fd fd fa fa fd fd fa fa fd fa
0x0c048005c470: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fd fd
0x0c048005c480: fa fa fd fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x0c048005c490: fa fa 00[02]fa fa fa fa fa fa fa fa fa fa fa fa
0x0c048005c4a0: fa fa fa fa fa fa fd fa fa fa 00 02 fa fa fd fd
0x0c048005c4b0: fa fa fd fd fa fa fa fa fa fa 00 00 fa fa 00 02
0x0c048005c4c0: fa fa fd fd fa fa fd fd fa fa fd fa fa fa fd fa
0x0c048005c4d0: fa fa fd fa fa fa fd fa fa fa fd fa fa fa fd fd
0x0c048005c4e0: fa fa fd fd fa fa fd fa fa fa fd fa fa fa fd fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==110292==ABORTING
It looks like a 10-byte allocation is made with iscsi_malloc
, added to smalloc_ptrs
during free (iscsi_iser_free_pdu
), and then grabbed by some other code that needs more than 10 bytes.
I will try the patch:
diff --git a/lib/iser.c b/lib/iser.c
index 3740564..4e062d8 100644
--- a/lib/iser.c
+++ b/lib/iser.c
@@ -770,7 +770,11 @@ iser_prepare_read_cmd(struct iser_conn *iser_conn,struct iser_pdu *iser_pdu)
if (data_size > 0) {
if (task->iovector_in.iov == NULL) {
- iser_pdu->iscsi_pdu.indata.data = iscsi_malloc(iscsi, data_size);
+ if (data_size <= iscsi->smalloc_size) {
+ iser_pdu->iscsi_pdu.indata.data = iscsi_smalloc(iscsi, data_size);
+ } else {
+ iser_pdu->iscsi_pdu.indata.data = iscsi_malloc(iscsi, data_size);
+ }
if (iser_pdu->iscsi_pdu.indata.data == NULL) {
iscsi_set_error(iscsi, "Failed to aloocate data buffer");
return -1;
The patchsets apply into version 1.19.0 (Debian 12 refer it), and unable to resolve the segv issue, it looks like version 1.19.0 of iser has other memory issues.
This smalloc vs malloc seems to be a very poor interface, - it is just too easy to misuse smalloc/free vs malloc/sfree like this, it really is. Either there should be single entry point for malloc and free, which decides to use small or regular allocator, remembers the allocation size somewhere near the data and the free routine calls the regular or small allocator too. Or maybe not remembers, but free (single point) also requires the initial size - which is a bit more difficult to use but still better than now. Or, like almost all other systems out there does, have just one allocator and use it for all sizes.
This smalloc vs malloc seems to be a very poor interface, - it is just too easy to misuse smalloc/free vs malloc/sfree like this, it really is. Either there should be single entry point for malloc and free, which decides to use small or regular allocator, remembers the allocation size somewhere near the data and the free routine calls the regular or small allocator too. Or maybe not remembers, but free (single point) also requires the initial size - which is a bit more difficult to use but still better than now. Or, like almost all other systems out there does, have just one allocator and use it for all sizes.
It requires a simpler and more efficient memory management interface.
Fixed on master branch. Close the issue.
If want to discuss the details of memory management, open a new ticket.
When using version master (commit: 551d90724d850874c6a094f71f07e7303eb9d590, with patch for compatable 1.19.0, see follow), the linux guest (Alpine 3.20.3), the crash reappeared. But the windwos 10 guest is work fine.
Follow is the crash backtrace:
Patch:
Originally posted by @Ir1Ka in https://github.com/sahlberg/libiscsi/issues/277#issuecomment-2455188740