markh794 / mhvtl

Linux based Virtual Tape Library
http://sites.google.com/site/linuxvtl2/
Other
139 stars 65 forks source link

The default L80 library works fine but the L700 fails in somewhat random ways. #115

Closed waa closed 1 year ago

waa commented 1 year ago

Hello, and thank you for this project! It is quite a nice way to test code against tape drives and libraries without the need for (slow) and expensive physical hardware. :)

I am currently doing testing on an Oracle Linux 8 system with a default compiled install of mhVTL v1.7, but have seen the same issues on CentOS7 and other distributions.

I can load/unload/transfer tapes with mtx and check the drives's and library's status with mt all day long using the default L80 tape library and everything is fine.

However, with the L700 library, I seem to have completely random results.

For example, right now, after a system reboot, mtx -f /dev/tape/by-id/scsi-SSTK_L700_XYZZY_A load 2 2 reports:

Loading media from Storage Element 2 into drive 2...
mtx: Request Sense: Long Report=yes
mtx: Request Sense: Valid Residual=yes
mtx: Request Sense: Error Code=70 (Current)
mtx: Request Sense: Sense Key=Hardware Error
mtx: Request Sense: FileMark=no
mtx: Request Sense: EOM=no
mtx: Request Sense: ILI=no
mtx: Request Sense: Residual = 00 00 00 00
mtx: Request Sense: Additional Sense Code = 04
mtx: Request Sense: Additional Sense Qualifier = 03
mtx: Request Sense: BPV=no
mtx: Request Sense: Error in CDB=no
mtx: Request Sense: SKSV=no
MOVE MEDIUM from Element Address 1001 to 502 Failed

There is currently nothing in drive 2 and slot 2 is full:

# mtx -f /dev/tape/by-id/scsi-SSTK_L700_XYZZY_A status
Storage Changer /dev/tape/by-id/scsi-SSTK_L700_XYZZY_A:4 Drives, 43 Slots ( 4 Import/Export )
Data Transfer Element 0:Empty
Data Transfer Element 1:Empty
Data Transfer Element 2:Empty
Data Transfer Element 3:Empty
Storage Element 1:Full :VolumeTag=E01001L8
Storage Element 2:Full :VolumeTag=E01002L8
[...snip...]

Then, I run the same command again and get:

# mtx -f /dev/tape/by-id/scsi-SSTK_L700_XYZZY_A load 2 2
Loading media from Storage Element 2 into drive 2...Destination Element Address 502 is Already Full

Is there anything I can gather to help explain and/or troubleshoot this issue?

Thank you for any help.

markh794 commented 1 year ago

Thanks for the bug report. Is there any chance of obtaining the syslog (typically /var/log/messages) during the time of the error. The contents of /etc/mhvtl/device.conf and /etc/mhvtl/library_contents. (where the '' is the library number) The ASC/ASCQ 04/03 is 'Manual Intervention Required' - The mount of the media failed when trying to move media to a drive, or the unmount of the drive failed when trying to move media back to a storage element. Obtaining information from the drive in question will hopefully explain the cause behind the error(s).

waa commented 1 year ago

mhvtl-L77info.tar.gz

Hello Mark,

Thank you for the very fast reply. I am in no way an expert with mhvtl, and I have grabbed why you asked for. :)

I am not sure why but I had two library files for this library. I had installed and tested some things with mhvtl a long time ago on this VM and recall I had problems then with the L700 library too.

When I found this, I renamed the library_contents.10 to library_contents.10-bad, and the other library_contents.10.orig one to library_contents.10 and rebooted and retried the tests. The same issue exists. I hope you can fins something here. :)

Thank you, Bill

markh794 commented 1 year ago

Hello Bill,

Can I ask what application you are using with the mhVTL ? Do you reproduce the issue simply using 'mtx' or some other robotic control daemon. What are you using to write to the tape ? The log shows the daemons in debug mode (really bad for performance due to the amount of logging) but the drive is occupied. It would appear that the tape has been ejected from the drive (e.g. mt -f /dev/st* offline) but never moved back to the storage slot The log file does not go back in time far enough to see what the tape drive 'life cycle' was. 'VERBOSE = 1' (mhvtl.conf) should be enough to identify the sequence of events.

markh794 commented 1 year ago

Reviewing the stklxx_pm.c source - the L180/L700 are the same, the L20/40/80 contain one additional Vendor Specific mode page / A Vendor-specific page for the StorageTek L20, L40 and L80 libraries / pcode = 0x2d; subpcode = 0; size = 0x26; but the values are blank - as it is not documented in the information I have available.

waa commented 1 year ago

Hello Mark,

OK, I have done a complete removal of mhvtl, including /etc/mhvtl, and the /opt/mhvtl directory with all the tape directories, rebooted and started from scratch (this is on Oracle Linux 8, and I even did a full upgrade including kernel).

Then I did a brand new install of mhvtl 1.7, complete with the default configs, and now when I attempt to load a tape from a slot to a drive it produces the same issue on both changers.

Immediately following a reboot:

[root@ol8 ~]# mtx -f /dev/tape/by-id/scsi-SSTK_L80_XYZZY_B load 1 0
Loading media from Storage Element 1 into drive 0...mtx: Request Sense: Long Report=yes
mtx: Request Sense: Valid Residual=yes
mtx: Request Sense: Error Code=70 (Current)
mtx: Request Sense: Sense Key=Hardware Error
mtx: Request Sense: FileMark=no
mtx: Request Sense: EOM=no
mtx: Request Sense: ILI=no
mtx: Request Sense: Residual = 00 00 00 00
mtx: Request Sense: Additional Sense Code = 04
mtx: Request Sense: Additional Sense Qualifier = 03
mtx: Request Sense: BPV=no
mtx: Request Sense: Error in CDB=no
mtx: Request Sense: SKSV=no
MOVE MEDIUM from Element Address 1000 to 500 Failed

[root@ol8 ~]# mtx -f /dev/tape/by-id/scsi-SSTK_L80_XYZZY_B load 1 0
Loading media from Storage Element 1 into drive 0...Destination Element Address 500 is Already Full

[root@ol8 ~]# mtx -f /dev/tape/by-id/scsi-SSTK_L80_XYZZY_B unload 1 0
Data Transfer Element 0 is Empty

I have no idea now. :)

Whatever you need, I can get, just let me know. Even a zoom session or remote ssh access to thsi VM is not out of the question at this point. :)

Sorry for the hassle, and thank you for any ideas.

markh794 commented 1 year ago

I'll install OracleLinuxR8U8 and test. I can provide my ssh public key :

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDK9xbUTt5GUO6YBUq97kPVNrVbPLInk/cTpKtCTCHC2N2X9TbDfxnhbvxfybabAVr3OtEsn/l4cIR+ywXWBbr9KKlwNpJdypvNJHVJfBfIuXEDg3EECZsvhw85uby4tK/zH0k8i5SEMYJGZyNCIjLKtl/QXF+omQ64PQN3Fqk+qC9ab5AKWX75TDDTe2LzDMGAJvB8ZWxXvYPAM1t16e+OvOkS8QgE4KmLjbxSFkyaSVpaW96+n9SxKCKFNAAJ1+yPnYmFCRXhZQa/+bDbpY9hmZjTq92HWY+DEpHrpIhXEqysq+REjVgebpnf32SUzPKTSLB63HzT8DX3BKaUW1byBMxauhxBhM4HgV3iZVqddLm7gSxNu5fbfmfYyKaQ69DqRlyC957VfaGrLFJMpFAJ2p5e4iDfMLEbnNvNqKh57CLuyuQ+1NMEtyN8Xyf+5ihsoknGQmvjRUPBP3oaDU6e2vNek/P+2sU3pnRbhNycH+uD6S3DvN/dqLScGAN5/Ok4/4UHWMVakT52/wNKEPN9CMg0kghFP8Zd5nBCqBwB3hqsrb3oxjDbwPcycQpLP9Zp5stB/00qYAGD0inFeQ36U/P7o1tvDbfrrITm4H0tArk/jJqLa4WU13L8SUf524WbIrdobAGIY82bHLO/r4ztNdecx8U11CliNL5MPM9KEQ== mark.harvey@nutanix.com
markh794 commented 1 year ago

First attempt to build kernel module failed

./config.sh
make -C /lib/modules/5.15.0-101.103.2.1.el8uek.x86_64/build M=/tmp/tmp.nAF3bOqyXT modules
make[1]: Entering directory '/usr/src/kernels/5.15.0-101.103.2.1.el8uek.x86_64'
warning: the compiler differs from the one used to build the kernel
  The kernel was built by: gcc (GCC) 11.2.1 20220127 (Red Hat 11.2.1-9.1.0.3)
  You are using:           gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-18.0.2)
  CC [M]  /tmp/tmp.nAF3bOqyXT/mhvtl.o
gcc: error: unrecognized command line option ‘-ftrivial-auto-var-init=zero’
gcc: error: unrecognized command line option ‘-fzero-call-used-regs=used-gpr’
make[2]: *** [scripts/Makefile.build:288: /tmp/tmp.nAF3bOqyXT/mhvtl.o] Error 1
make[1]: *** [Makefile:1992: /tmp/tmp.nAF3bOqyXT] Error 2
make[1]: Leaving directory '/usr/src/kernels/5.15.0-101.103.2.1.el8uek.x86_64'
make: *** [Makefile:30: default] Error 2

google - scl...

[markh@oraclilinux8 ~]$ scl enable gcc-toolset-12 bash

Kernel module builds fine and installs (note: Using the newly provided 'mhvtl_kernel_mod_build' utility)

[markh@oraclilinux8 ~]$ mhvtl_kernel_mod_build 
./config.sh
make -C /lib/modules/5.15.0-101.103.2.1.el8uek.x86_64/build M=/tmp/tmp.NNzQxQT1yX modules
make[1]: Entering directory '/usr/src/kernels/5.15.0-101.103.2.1.el8uek.x86_64'
warning: the compiler differs from the one used to build the kernel
  The kernel was built by: gcc (GCC) 11.2.1 20220127 (Red Hat 11.2.1-9.1.0.3)
  You are using:           gcc (GCC) 12.2.1 20221121 (Red Hat 12.2.1-7)
  CC [M]  /tmp/tmp.NNzQxQT1yX/mhvtl.o
  MODPOST /tmp/tmp.NNzQxQT1yX/Module.symvers
  CC [M]  /tmp/tmp.NNzQxQT1yX/mhvtl.mod.o
  LD [M]  /tmp/tmp.NNzQxQT1yX/mhvtl.ko
  BTF [M] /tmp/tmp.NNzQxQT1yX/mhvtl.ko
Skipping BTF generation for /tmp/tmp.NNzQxQT1yX/mhvtl.ko due to unavailability of vmlinux
make[1]: Leaving directory '/usr/src/kernels/5.15.0-101.103.2.1.el8uek.x86_64'
install -o root -g root -m 644 mhvtl.ko /lib/modules/5.15.0-101.103.2.1.el8uek.x86_64/kernel/drivers/scsi/; \
depmod -a; \
if [ -f "/etc/modprobe.d/unsupported-modules" ]; then \
    sed -i 's/^allow_unsupported_modules 0/allow_unsupported_modules 1/' /etc/modprobe.d/unsupported-modules; \
fi
[markh@oraclilinux8 ~]$ sudo modprobe mhvtl opts=0

sudo systemctl start mhvtl.target

[markh@oraclilinux8 ~]$ lsscsi -g
[0:0:0:0]    cd/dvd  QEMU     QEMU DVD-ROM     2.5+  /dev/sr0   /dev/sg0 
[2:0:0:0]    disk    NUTANIX  VDISK            0     /dev/sda   /dev/sg1 
[3:0:0:0]    mediumx STK      L700             0107  /dev/sch1  /dev/sg11
[3:0:1:0]    tape    IBM      ULT3580-TD8      0107  /dev/st1   /dev/sg3 
[3:0:2:0]    tape    IBM      ULT3580-TD8      0107  /dev/st0   /dev/sg2 
[3:0:3:0]    tape    IBM      ULT3580-TD6      0107  /dev/st5   /dev/sg8 
[3:0:4:0]    tape    IBM      ULT3580-TD6      0107  /dev/st7   /dev/sg10
[3:0:8:0]    mediumx STK      L80              0107  /dev/sch0  /dev/sg4 
[3:0:9:0]    tape    STK      T10000B          0107  /dev/st4   /dev/sg7 
[3:0:10:0]   tape    STK      T10000B          0107  /dev/st3   /dev/sg6 
[3:0:11:0]   tape    STK      T10000B          0107  /dev/st2   /dev/sg5 
[3:0:12:0]   tape    STK      T10000B          0107  /dev/st6   /dev/sg9 
[markh@oraclilinux8 ~]$ sudo mtx -f /dev/sg11 load 1 0
Loading media from Storage Element 1 into drive 0...done

And place tape back again...

[markh@oraclilinux8 ~]$ sudo mtx -f /dev/sg11 unload 1 0
Unloading drive 0 into Storage Element 1...done

kernel version I'm testing:

[markh@oraclilinux8 ~]$ uname -a
Linux oraclilinux8.localdomain 5.15.0-101.103.2.1.el8uek.x86_64 #2 SMP Mon May 1 20:11:30 PDT 2023 x86_64 x86_64 x86_64 GNU/Linux
markh794 commented 1 year ago

and the log trace:

[markh@oraclilinux8 ~]$ sudo egrep "vtltape|vtllibrary" /var/log/messages
<snip>
Jun  1 22:22:00 oraclilinux8 /usr/bin/vtllibrary[70061]: CDB (136) (delay 234805): 12 00 00 00 38 00
Jun  1 22:22:00 oraclilinux8 /usr/bin/vtllibrary[70061]: spc_inquiry(): INQUIRY ** (136)
Jun  1 22:22:00 oraclilinux8 /usr/bin/vtllibrary[70061]: CDB (137) (delay 405): 1a 08 1d 00 88 00
Jun  1 22:22:00 oraclilinux8 /usr/bin/vtllibrary[70061]: spc_mode_sense(): MODE SENSE 6 (137) **
Jun  1 22:22:00 oraclilinux8 /usr/bin/vtllibrary[70061]: CDB (138) (delay 405): b8 12 03 e8 00 27 00 00 0f e8 00 00
Jun  1 22:22:00 oraclilinux8 /usr/bin/vtllibrary[70061]: smc_read_element_status(): READ ELEMENT STATUS (138) **
Jun  1 22:22:00 oraclilinux8 /usr/bin/vtllibrary[70061]: CDB (139) (delay 405): b8 13 00 0a 00 04 00 00 0f e8 00 00
Jun  1 22:22:00 oraclilinux8 /usr/bin/vtllibrary[70061]: smc_read_element_status(): READ ELEMENT STATUS (139) **
Jun  1 22:22:00 oraclilinux8 /usr/bin/vtllibrary[70061]: CDB (140) (delay 405): b8 14 01 f4 00 04 00 00 0f e8 00 00
Jun  1 22:22:00 oraclilinux8 /usr/bin/vtllibrary[70061]: smc_read_element_status(): READ ELEMENT STATUS (140) **
Jun  1 22:22:00 oraclilinux8 /usr/bin/vtllibrary[70061]: CDB (141) (delay 405): b8 11 00 01 00 01 00 00 0f e8 00 00
Jun  1 22:22:00 oraclilinux8 /usr/bin/vtllibrary[70061]: smc_read_element_status(): READ ELEMENT STATUS (141) **
Jun  1 22:22:00 oraclilinux8 /usr/bin/vtllibrary[70061]: CDB (142) (delay 405): a5 00 00 01 03 e8 01 f4 00 00 00 00
Jun  1 22:22:00 oraclilinux8 /usr/bin/vtllibrary[70061]: smc_move_medium(): MOVE MEDIUM (142) **
Jun  1 22:22:00 oraclilinux8 /usr/bin/vtllibrary[70061]: smc_move_medium(): Moving from slot 1000 to slot 500 using transport 1, Invert media: no
Jun  1 22:22:00 oraclilinux8 /usr/bin/vtllibrary[70061]: is_drive_empty(): 10: Sending "mount_state" to snd_id 11
Jun  1 22:22:00 oraclilinux8 /usr/bin/vtltape[70064]: processMessageQ(): 11: Received message "mount_state" from snd_id 10
Jun  1 22:22:00 oraclilinux8 /usr/bin/vtltape[70064]: processMessageQ(): 11: Replying to snd_id 10 with "Not occupied"
Jun  1 22:22:00 oraclilinux8 /usr/bin/vtllibrary[70061]: is_drive_empty(): 10: Received "Not occupied" from snd_id 11
Jun  1 22:22:00 oraclilinux8 /usr/bin/vtllibrary[70061]: move_slot2drive(): About to send cmd: 'lload E01001L8' to drive 1
Jun  1 22:22:00 oraclilinux8 /usr/bin/vtltape[70064]: processMessageQ(): 11: Received message "lload E01001L8" from snd_id 10
Jun  1 22:22:00 oraclilinux8 /usr/bin/vtltape[70064]: loadTape(): Media type 'LTO8' loaded with S/No. : E01001L8_1685671154
Jun  1 22:22:00 oraclilinux8 /usr/bin/vtltape[70064]: rewind_tape(): Media is writable
Jun  1 22:22:00 oraclilinux8 /usr/bin/vtltape[70064]: return_sense(): [Key/ASC/ASCQ] [06 28 00]
Jun  1 22:22:00 oraclilinux8 /usr/bin/vtltape[70064]: loadTape(): Media is writable
Jun  1 22:22:00 oraclilinux8 /usr/bin/vtltape[70064]: loadTape(): Setting MediumDensityCode to LTO8 (0x5e) Media type: 0x88
Jun  1 22:22:00 oraclilinux8 /usr/bin/vtltape[70064]: processMessageQ(): 11: Replying to snd_id 10 with "Loaded OK: E01001L8"
Jun  1 22:22:00 oraclilinux8 /usr/bin/vtllibrary[70061]: check_tape_load(): 10: Received "Loaded OK: E01001L8" from snd_id 11
Jun  1 22:22:00 oraclilinux8 /usr/bin/vtllibrary[70061]: move_cart(): Setting src slot access true
Jun  1 22:22:45 oraclilinux8 /usr/bin/vtllibrary[70061]: CDB (143) (delay 190005): 12 00 00 00 38 00
Jun  1 22:22:45 oraclilinux8 /usr/bin/vtllibrary[70061]: spc_inquiry(): INQUIRY ** (143)
Jun  1 22:22:45 oraclilinux8 /usr/bin/vtllibrary[70061]: CDB (144) (delay 805): 1a 08 1d 00 88 00
Jun  1 22:22:45 oraclilinux8 /usr/bin/vtllibrary[70061]: spc_mode_sense(): MODE SENSE 6 (144) **
Jun  1 22:22:45 oraclilinux8 /usr/bin/vtllibrary[70061]: CDB (145) (delay 805): b8 12 03 e8 00 27 00 00 0f e8 00 00
Jun  1 22:22:45 oraclilinux8 /usr/bin/vtllibrary[70061]: smc_read_element_status(): READ ELEMENT STATUS (145) **
Jun  1 22:22:45 oraclilinux8 /usr/bin/vtllibrary[70061]: CDB (146) (delay 405): b8 13 00 0a 00 04 00 00 0f e8 00 00
Jun  1 22:22:45 oraclilinux8 /usr/bin/vtllibrary[70061]: smc_read_element_status(): READ ELEMENT STATUS (146) **
Jun  1 22:22:45 oraclilinux8 /usr/bin/vtllibrary[70061]: CDB (147) (delay 405): b8 14 01 f4 00 04 00 00 0f e8 00 00
Jun  1 22:22:45 oraclilinux8 /usr/bin/vtllibrary[70061]: smc_read_element_status(): READ ELEMENT STATUS (147) **
Jun  1 22:22:45 oraclilinux8 /usr/bin/vtllibrary[70061]: CDB (148) (delay 405): b8 11 00 01 00 01 00 00 0f e8 00 00
Jun  1 22:22:45 oraclilinux8 /usr/bin/vtllibrary[70061]: smc_read_element_status(): READ ELEMENT STATUS (148) **
Jun  1 22:22:45 oraclilinux8 /usr/bin/vtllibrary[70061]: CDB (149) (delay 805): a5 00 00 01 01 f4 03 e8 00 00 00 00
Jun  1 22:22:45 oraclilinux8 /usr/bin/vtllibrary[70061]: smc_move_medium(): MOVE MEDIUM (149) **
Jun  1 22:22:45 oraclilinux8 /usr/bin/vtllibrary[70061]: smc_move_medium(): Moving from slot 500 to slot 1000 using transport 1, Invert media: no
Jun  1 22:22:45 oraclilinux8 /usr/bin/vtllibrary[70061]: is_drive_empty(): 10: Sending "mount_state" to snd_id 11
Jun  1 22:22:45 oraclilinux8 /usr/bin/vtltape[70064]: processMessageQ(): 11: Received message "mount_state" from snd_id 10
Jun  1 22:22:45 oraclilinux8 /usr/bin/vtltape[70064]: processMessageQ(): 11: Replying to snd_id 10 with "occupied"
Jun  1 22:22:45 oraclilinux8 /usr/bin/vtllibrary[70061]: is_drive_empty(): 10: Received "occupied" from snd_id 11
Jun  1 22:22:46 oraclilinux8 /usr/bin/vtltape[70064]: processMessageQ(): 11: Received message "unload E01001L8        " from snd_id 10
Jun  1 22:22:46 oraclilinux8 /usr/bin/vtltape[70064]: processMessageQ(): 11: Replying to snd_id 10 with "Unloaded OK"
Jun  1 22:22:46 oraclilinux8 /usr/bin/vtllibrary[70061]: check_tape_unload(): 10: Received "Unloaded OK" from snd_id 11
Jun  1 22:22:46 oraclilinux8 /usr/bin/vtllibrary[70061]: move_cart(): Setting src slot access true
Jun  1 22:22:46 oraclilinux8 /usr/bin/vtltape[70064]: processMessageQ(): 11: Received message "set_empty" from snd_id 10
Jun  1 22:22:46 oraclilinux8 /usr/bin/vtltape[70064]: processMessageQ(): 11: Replying to snd_id 10 with "Unloaded OK"
Jun  1 22:22:46 oraclilinux8 /usr/bin/vtllibrary[70061]: check_tape_unload(): 10: Received "Unloaded OK" from snd_id 11
markh794 commented 1 year ago

Just to confirm the new binaries are not picking up old(er) shared libraries.

[markh@oraclilinux8 ~]$ sudo ldd /usr/bin/vtltape
    linux-vdso.so.1 (0x00007ffc9d667000)
    libz.so.1 => /lib64/libz.so.1 (0x00007f98075a9000)
    libvtlcart.so => /lib/libvtlcart.so (0x00007f980738f000)
    libvtlscsi.so => /lib/libvtlscsi.so (0x00007f980716e000)
    libc.so.6 => /lib64/libc.so.6 (0x00007f9806da9000)
    libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f9806b89000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f98077c1000)
[markh@oraclilinux8 ~]$ sudo ldd /usr/bin/vtllibrary
    linux-vdso.so.1 (0x00007ffe964cc000)
    libvtlscsi.so => /lib/libvtlscsi.so (0x00007fa775aac000)
    libc.so.6 => /lib64/libc.so.6 (0x00007fa7756e7000)
    libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fa7754c7000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fa775ccd000)
markh794 commented 1 year ago

Version of mhVTL (just a git clone https://github.com/markh794/mhvtl.git)

[markh@oraclilinux8 ~]$ vtlcmd -V
Version: 1.7.1-git-master
waa commented 1 year ago

Hello Mark,

Thank you for taking the time to spin up an Oracle Linux system and test.

I also had no trouble compiling and installing, but my end results were not the same. :)

FYI, I sent an email to your address in your ssh pub key.

Thanks again for the attention you have given to this issue I am having!

markh794 commented 1 year ago

Hello Bill,

Can I ask you to test this patch please.

diff --git a/usr/vtltape.c b/usr/vtltape.c
index 65e0d9b..3749a90 100644
--- a/usr/vtltape.c
+++ b/usr/vtltape.c
@@ -1491,6 +1491,7 @@ static int processMessageQ(struct q_msg *msg, uint8_t *sam_stat)
        char *pcl;
        char s[128];
        char *z;
+       int rc;
        struct lu_phy_attr *lu;
        int pcl_len;

@@ -1520,11 +1521,15 @@ static int processMessageQ(struct q_msg *msg, uint8_t *sam_stat)
                                MHVTL_ERR("Ugghhh... out of memory allocating buffer for barcode: %s", pcl);
                                sprintf(s, "%s: %s", msg_load_failed, pcl);
                        } else {
-                               loadTape(pcl, sam_stat);
-                               sprintf(s, "%s: %s",
-                                       (lu_ssc.load_status == TAPE_UNLOADED) ? msg_load_failed : msg_load_ok,
-                                       pcl);
-                               strncpy(lu_ssc.barcode, pcl, pcl_len);
+                               rc = loadTape(pcl, sam_stat);
+                               sprintf(s, "%s: %s", (lu_ssc.load_status == TAPE_UNLOADED) ? msg_load_failed : msg_load_ok, pcl);
+                               if (rc) {
+                                       free(lu_ssc.barcode);
+                                       lu_ssc.barcode = NULL;
+                               } else {
+                                       strncpy(lu_ssc.barcode, pcl, pcl_len);
+                               }
+
                        }
                }
                SEND_MSG_AND_LOG(s, msg->snd_id);

Thanks Mark

waa commented 1 year ago

Hello Mark,

FYI: I had to do patch -l < mhvtl.patch since Github handed me code space-formatted, and your code is tab-formatted. :)

OK, this patch seems to have fixed the issue I noted in PMs this weekend.

Testing the L700, here's what I did:

- mtx status (to check drives, slots and tapes): Looks normal. 
- mtx load 1 0 (load L8 tape into L8 drive): Looks normal
- mtx unload 1 0 (Unload L8 tape back to its slot): Looks normal
- mtx load 30 0 (Load L6 tape into L8 drive): Fails, as expected.
- mtx status (to check drives, slots and tapes): Looks normal. Nothing in drive 0, and slot 30 still populated.
- mtx load 1 0 (Load L8 tape into L8 drive which just failed to load): WORKS.

Thank you for looking into this, and for the quick patch.

Note: For completeness, we did find a bug, but the initial problem I was having which prompted me to open this issue was that I was loading an L8 tape into an L6 drive and/or vice versa, never realizing that the L700 library contained L8 and L6 drives as well as L6 and L8 drives. :man_facepalming:

Mark, I will let you post your official findings and fix for future readers. :)

Thank you again.

markh794 commented 1 year ago

Thanks for testing and confirming the proof-of-concept fix. I'll tidy up the patch and plan on pushing into github in the next day or so. FWIW: The mixed drive combo is deliberate decision on my side. When I worked with tape drives, customers often upgraded a sub-set of drives in a library, so this was a config seen often enough that I thought a default mixed drive config was a good idea

markh794 commented 1 year ago

Do you have a preferred email for a 'tested by' / 'reported by' in the commit or should I just use this thread ?

waa commented 1 year ago

Thanks for testing and confirming the proof-of-concept fix. I'll tidy up the patch and plan on pushing into github in the next day or so.

Awesome!

FWIW: The mixed drive combo is deliberate decision on my side. When I worked with tape drives, customers often upgraded a sub-set of drives in a library, so this was a config seen often enough that I thought a default mixed drive config was a good idea

It is a good idea, and I see this happen with our customers too. Nothing to change on your side about this. I just completely missed this fact until the other night when I saw the log entry I had quoted to you. :)

waa commented 1 year ago

Do you have a preferred email for a 'tested by' / 'reported by' in the commit or should I just use this thread ?

You may use my name, emai: waa@protonmail.com, or this thread. Up to you. :)

markh794 commented 1 year ago

Many thanks Bill, I've now submitted a patch into github : 74185e3c7a91fc5353cd77359c444a21b1d99466 Thanks for the effort while we tracked this one down.