markh794 / mhvtl

Linux based Virtual Tape Library
http://sites.google.com/site/linuxvtl2/
Other
139 stars 65 forks source link

mhvtl crash when using different tape types on the same changer #87

Open rohr22 opened 2 years ago

rohr22 commented 2 years ago

Mark, I got a crash when trying to write to an LTO-6 tape. The changer was configured to use LTO-6, LTO-7, and LTO-8 drives. The add_library_contents_10() function in generate_device_conf.in had:

    add_library 10 0 0 0 "IBM" "3584"  "2160"  "XYZZY_A"
    #         index channel target LUN S/No Lib# Slot
    add_ibm_ultrium_6_drive 11 0 1 0 "XYZZY_A1" 10 1
    add_ibm_ultrium_6_drive 12 0 2 0 "XYZZY_A2" 10 2
    add_ibm_ultrium_6_drive 13 0 3 0 "XYZZY_A3" 10 3
    add_ibm_ultrium_7_drive 14 0 4 0 "XYZZY_A4" 10 4
    add_ibm_ultrium_7_drive 15 0 5 0 "XYZZY_A5" 10 5
    add_ibm_ultrium_7_drive 16 0 6 0 "XYZZY_A6" 10 6
    add_ibm_ultrium_8_drive 17 0 7 0 "XYZZY_A7" 10 7
    add_ibm_ultrium_8_drive 18 0 8 0 "XYZZY_A8" 10 8
    add_ibm_ultrium_8_drive 19 0 9 0 "XYZZY_A9" 10 9

My test should have been trying to write to an LTO-6 tape since we did not have any LTO-7 or LTO-8 tapes available. I saw this logged to /var/log/messages:

Jan 19 17:03:49 choctaw kernel: vtllibrary[899]: segfault at 14 ip 00007f6d61228079 sp 00007ffcdf4172d0 error 4 in libc-2.17.so[7f6d611db000+1c4000]

and: Jan 19 17:03:49 choctaw /usr/bin/vtllibrary[899]: dump_element_desc(): ASCII data : T3580-TD8 XYZZY_A8 Jan 19 17:03:49 choctaw /usr/bin/vtllibrary[899]: dump_element_desc(): Debug.... i = 384, len = 48 Jan 19 17:03:49 choctaw /usr/bin/vtllibrary[899]: dump_element_desc(): Element Address : 14368 Jan 19 17:03:49 choctaw /usr/bin/vtllibrary[899]: dump_element_desc(): Status : 0x20 Jan 19 17:03:49 choctaw /usr/bin/vtllibrary[899]: dump_element_desc(): Medium type : 2 Jan 19 17:03:49 choctaw /usr/bin/vtllibrary[899]: dump_element_desc(): Identification Descriptor Jan 19 17:03:49 choctaw /usr/bin/vtllibrary[899]: dump_element_desc(): Code Set : 0x01 Jan 19 17:03:49 choctaw /usr/bin/vtllibrary[899]: dump_element_desc(): Identifier type : 0x08 Jan 19 17:03:49 choctaw /usr/bin/vtllibrary[899]: dump_element_desc(): Identifier length : 32 Jan 19 17:03:49 choctaw /usr/bin/vtllibrary[899]: dump_element_desc(): ASCII data : Jan 19 17:03:49 choctaw /usr/bin/vtllibrary[899]: dump_element_desc(): ASCII data : Jan 19 17:03:49 choctaw /usr/bin/vtllibrary[899]: dump_element_desc(): ASCII data : ULT3580-TD8 XYZZY_A9 Jan 19 17:03:49 choctaw /usr/bin/vtllibrary[899]: decode_element_status(): Element Status Page Jan 19 17:03:49 choctaw abrt-hook-ccpp: Process 899 (vtllibrary) of user 0 killed by SIGSEGV - dumping core Jan 19 17:03:49 choctaw systemd: vtllibrary@10.service: main process exited, code=dumped, status=11/SEGV Jan 19 17:03:49 choctaw systemd: Unit vtllibrary@10.service entered failed state. Jan 19 17:03:49 choctaw systemd: vtllibrary@10.service failed.

Attached is the /var/log/messages from when the problem occurred. Unfortunately, a complete crash dump was not generated:

Jan 19 17:03:50 choctaw abrt-server: Traceback (most recent call last): Jan 19 17:03:50 choctaw abrt-server: File "/usr/sbin/sosreport", line 14, in Jan 19 17:03:50 choctaw abrt-server: from sos.sosreport import main Jan 19 17:03:50 choctaw abrt-server: ModuleNotFoundError: No module named 'sos' Jan 19 17:03:50 choctaw abrt-server: 'post-create' on '/var/spool/abrt/ccpp-2022-01-19-17:03:49-899' exited with 1 Jan 19 17:03:50 choctaw abrt-server: Deleting problem directory '/var/spool/abrt/ccpp-2022-01-19-17:03:49-899' Jan 19 17:03:55 choctaw sm-notify[1523]: Unable to notify rand.clearlake.ibm.com, giving up Jan 19 17:04:39 choctaw kernel: mhvtl: mhvtl_timer_intr_handler: Unexpected interrupt, indx 0

I thought I would attach the vtl log in case you can figure out what caused the crash.

Thank you, Peter

mhvtl_crash.tar.gz

rohr22 commented 2 years ago

I reconfigured my system to have this for the add_library_contents_10 and the problem did not occur with the same test:

    add_library 10 0 0 0 "IBM" "3584"  "2160"  "XYZZY_A"
    #         index channel target LUN S/No Lib# Slot
    add_ibm_ultrium_6_drive 11 0 1 0 "XYZZY_A1" 10 1
    add_ibm_ultrium_6_drive 12 0 2 0 "XYZZY_A2" 10 2
    add_ibm_ultrium_6_drive 13 0 3 0 "XYZZY_A3" 10 3
    add_ibm_ultrium_6_drive 14 0 4 0 "XYZZY_A4" 10 4
    add_ibm_ultrium_6_drive 15 0 5 0 "XYZZY_A5" 10 5
    add_ibm_ultrium_6_drive 16 0 6 0 "XYZZY_A6" 10 6
    add_ibm_ultrium_6_drive 17 0 7 0 "XYZZY_A7" 10 7
    add_ibm_ultrium_6_drive 18 0 8 0 "XYZZY_A8" 10 8
    add_ibm_ultrium_6_drive 19 0 9 0 "XYZZY_A9" 10 9
rohr22 commented 2 years ago

Earlier when we had a mix of LTO-6, LTO-7, and LTO-8 drives, we saw:

[root@choctaw ITDT]# ./itdt scan Scanning SCSI Bus ...

0 /dev/sg5 - [ULT3580-TD7]-[0106] S/N:XYZZY_A4 H4-B0-T4-L0 (Generic-Device)

1 /dev/sg6 - [ULT3580-TD8]-[0106] S/N:XYZZY_A9 H4-B0-T9-L0 (Generic-Device)

2 /dev/sg7 - [ULT3580-TD8]-[0106] S/N:XYZZY_A8 H4-B0-T8-L0 (Generic-Device)

3 /dev/sg8 - [ULT3580-TD8]-[0106] S/N:XYZZY_A7 H4-B0-T7-L0 (Generic-Device)

4 /dev/sg9 - [ULT3580-TD6]-[0106] S/N:XYZZY_A2 H4-B0-T2-L0 (Generic-Device)

5 /dev/sg10 - [ULT3580-TD7]-[0106] S/N:XYZZY_A6 H4-B0-T6-L0 (Generic-Device)

6 /dev/sg11 - [ULT3580-TD6]-[0106] S/N:XYZZY_A3 H4-B0-T3-L0 (Generic-Device)

7 /dev/sg12 - [ULT3580-TD7]-[0106] S/N:XYZZY_A5 H4-B0-T5-L0 (Generic-Device)

8 /dev/sg13 - [ULT3580-TD6]-[0106] S/N:XYZZY_A1 H4-B0-T1-L0 (Generic-Device)

9 /dev/sg14 - [3584 TS3500]-[0106] S/N:XYZZY_A H4-B0-T0-L0 (Generic-Device)

Exit with code: 0 [root@choctaw ITDT]#

dabiged commented 2 years ago

I think the root cause of this may be that LTO6 tapes cannot be read in an LTO8 drive.

IBM Tape Compatibility Matrix

markh794 commented 2 years ago

LTO6 media in an LTO8 drive should not have caused a segfault.

I'm on leave until next week (left the laptop behind).. I'll check it out when back

rohr22 commented 2 years ago

Mark, thank you. Enjoy your time off.

The problem occurred both times I ran a test, so it is repeatable. Each time I had to reboot to get the system to see tapes with mhvtl. The lsscsi command was not showing any tape drives and the vtllibrary@10.service had crashed.