Open rohr22 opened 2 years ago
I reconfigured my system to have this for the add_library_contents_10 and the problem did not occur with the same test:
add_library 10 0 0 0 "IBM" "3584" "2160" "XYZZY_A"
# index channel target LUN S/No Lib# Slot
add_ibm_ultrium_6_drive 11 0 1 0 "XYZZY_A1" 10 1
add_ibm_ultrium_6_drive 12 0 2 0 "XYZZY_A2" 10 2
add_ibm_ultrium_6_drive 13 0 3 0 "XYZZY_A3" 10 3
add_ibm_ultrium_6_drive 14 0 4 0 "XYZZY_A4" 10 4
add_ibm_ultrium_6_drive 15 0 5 0 "XYZZY_A5" 10 5
add_ibm_ultrium_6_drive 16 0 6 0 "XYZZY_A6" 10 6
add_ibm_ultrium_6_drive 17 0 7 0 "XYZZY_A7" 10 7
add_ibm_ultrium_6_drive 18 0 8 0 "XYZZY_A8" 10 8
add_ibm_ultrium_6_drive 19 0 9 0 "XYZZY_A9" 10 9
Earlier when we had a mix of LTO-6, LTO-7, and LTO-8 drives, we saw:
[root@choctaw ITDT]# ./itdt scan Scanning SCSI Bus ...
Exit with code: 0 [root@choctaw ITDT]#
I think the root cause of this may be that LTO6 tapes cannot be read in an LTO8 drive.
LTO6 media in an LTO8 drive should not have caused a segfault.
I'm on leave until next week (left the laptop behind).. I'll check it out when back
Mark, thank you. Enjoy your time off.
The problem occurred both times I ran a test, so it is repeatable. Each time I had to reboot to get the system to see tapes with mhvtl. The lsscsi command was not showing any tape drives and the vtllibrary@10.service had crashed.
Mark, I got a crash when trying to write to an LTO-6 tape. The changer was configured to use LTO-6, LTO-7, and LTO-8 drives. The add_library_contents_10() function in generate_device_conf.in had:
My test should have been trying to write to an LTO-6 tape since we did not have any LTO-7 or LTO-8 tapes available. I saw this logged to /var/log/messages:
Jan 19 17:03:49 choctaw kernel: vtllibrary[899]: segfault at 14 ip 00007f6d61228079 sp 00007ffcdf4172d0 error 4 in libc-2.17.so[7f6d611db000+1c4000]
and: Jan 19 17:03:49 choctaw /usr/bin/vtllibrary[899]: dump_element_desc(): ASCII data : T3580-TD8 XYZZY_A8 Jan 19 17:03:49 choctaw /usr/bin/vtllibrary[899]: dump_element_desc(): Debug.... i = 384, len = 48 Jan 19 17:03:49 choctaw /usr/bin/vtllibrary[899]: dump_element_desc(): Element Address : 14368 Jan 19 17:03:49 choctaw /usr/bin/vtllibrary[899]: dump_element_desc(): Status : 0x20 Jan 19 17:03:49 choctaw /usr/bin/vtllibrary[899]: dump_element_desc(): Medium type : 2 Jan 19 17:03:49 choctaw /usr/bin/vtllibrary[899]: dump_element_desc(): Identification Descriptor Jan 19 17:03:49 choctaw /usr/bin/vtllibrary[899]: dump_element_desc(): Code Set : 0x01 Jan 19 17:03:49 choctaw /usr/bin/vtllibrary[899]: dump_element_desc(): Identifier type : 0x08 Jan 19 17:03:49 choctaw /usr/bin/vtllibrary[899]: dump_element_desc(): Identifier length : 32 Jan 19 17:03:49 choctaw /usr/bin/vtllibrary[899]: dump_element_desc(): ASCII data : Jan 19 17:03:49 choctaw /usr/bin/vtllibrary[899]: dump_element_desc(): ASCII data : Jan 19 17:03:49 choctaw /usr/bin/vtllibrary[899]: dump_element_desc(): ASCII data : ULT3580-TD8 XYZZY_A9 Jan 19 17:03:49 choctaw /usr/bin/vtllibrary[899]: decode_element_status(): Element Status Page Jan 19 17:03:49 choctaw abrt-hook-ccpp: Process 899 (vtllibrary) of user 0 killed by SIGSEGV - dumping core Jan 19 17:03:49 choctaw systemd: vtllibrary@10.service: main process exited, code=dumped, status=11/SEGV Jan 19 17:03:49 choctaw systemd: Unit vtllibrary@10.service entered failed state. Jan 19 17:03:49 choctaw systemd: vtllibrary@10.service failed.
Attached is the /var/log/messages from when the problem occurred. Unfortunately, a complete crash dump was not generated:
Jan 19 17:03:50 choctaw abrt-server: Traceback (most recent call last): Jan 19 17:03:50 choctaw abrt-server: File "/usr/sbin/sosreport", line 14, in
Jan 19 17:03:50 choctaw abrt-server: from sos.sosreport import main
Jan 19 17:03:50 choctaw abrt-server: ModuleNotFoundError: No module named 'sos'
Jan 19 17:03:50 choctaw abrt-server: 'post-create' on '/var/spool/abrt/ccpp-2022-01-19-17:03:49-899' exited with 1
Jan 19 17:03:50 choctaw abrt-server: Deleting problem directory '/var/spool/abrt/ccpp-2022-01-19-17:03:49-899'
Jan 19 17:03:55 choctaw sm-notify[1523]: Unable to notify rand.clearlake.ibm.com, giving up
Jan 19 17:04:39 choctaw kernel: mhvtl: mhvtl_timer_intr_handler: Unexpected interrupt, indx 0
I thought I would attach the vtl log in case you can figure out what caused the crash.
Thank you, Peter
mhvtl_crash.tar.gz