zmanda / amanda

Amanda Network Backup
https://www.zmanda.com/downloads/
Other
222 stars 107 forks source link

/dev/nst0 assumed to be drive 0 in changer #236

Open DaveAtFraud opened 1 year ago

DaveAtFraud commented 1 year ago

It appears that when amanda works with a tape changer that it assumes that the first drive in the changer will be /dev/nst0. Running Rocky Linux 9 this is not the case:

[root@rocky9mate ~]# lsscsi -g [0:0:0:0] disk VMware Virtual disk 2.0 /dev/sda /dev/sg0 [3:0:0:0] cd/dvd NECVMWar VMware SATA CD00 1.00 /dev/sr0 /dev/sg1 [33:0:0:0] mediumx STK L700 0107 /dev/sch0 /dev/sg9 [33:0:1:0] tape IBM ULT3580-TD5 0107 /dev/st6 /dev/sg8 [33:0:2:0] tape IBM ULT3580-TD5 0107 /dev/st0 /dev/sg2 [33:0:3:0] tape IBM ULT3580-TD4 0107 /dev/st5 /dev/sg7 [33:0:4:0] tape IBM ULT3580-TD4 0107 /dev/st1 /dev/sg3 [33:0:8:0] mediumx STK L80 0107 /dev/sch1 /dev/sg10 [33:0:9:0] tape STK T10000B 0107 /dev/st4 /dev/sg6 [33:0:10:0] tape STK T10000B 0107 /dev/st7 /dev/sg11 [33:0:11:0] tape STK T10000B 0107 /dev/st2 /dev/sg4 [33:0:12:0] tape STK T10000B 0107 /dev/st3 /dev/sg5

This results in various amanda pieces looking in the wrong drive:

[amandabackup@rocky9mate ~]$ amlabel DailySet1 -f E01001L4 slot 1 Reading label... Error reading volume label: Tape device /dev/nst0 is not ready or is empty. Not writing label. Not writing label.

But if I force the slot 1 tape into what is actually /dev/nst0:

Volume with label 'E01001L4' is active and contains data from this configuration. Consider using 'amrmtape' to remove volume 'E01001L4' from the catalog. Writing label 'E01001L4'... Checking label... Success!

Done concurrently with the above to force the tape into /dev/nst0:

[root@rocky9mate ~]# mtx -f /dev/sg9 unload 1 0 Unloading drive 0 into Storage Element 1...done [root@rocky9mate ~]# mtx -f /dev/sg9 load 1 1 Loading media from Storage Element 1 into drive 1...done

But not everything is happy:

[amandabackup@rocky9mate ~]$ amcheck DailySet1 Amanda Tape Server Host Check

NOTE: Holding disk '/var/lib/amanda/dumps': 32800768 KB disk space available, using 32698368 KB slot 1:the requested volume is in drive 1, which this changer instance cannot access the requested volume is in drive 1, which this changer instance cannot access

Insert a new volume in STK-L700 and press enter, or ^D to abort.

The tape changer is actually a VTL created by mhVTL. This configuration (amanda + mhVTL) works correctly on CentOS 7 since /dev/st0 (aka /dev/nst0) is configured as the first drive in the changer. Not sure what kernel changes resulted in the random ordering of st devices. Will look into ways to force device number assignments as a work around. I have tried using /dev/nst6 (drive 0 in the changer) which also doesn't work.

DaveAtFraud commented 1 year ago

My work around is to reconfigure the VTL to have only one changer with only one tape drive. When there is only one, the targets don't move.

Additional info:

1) This problem seems to be a fall out from most distro kernels are now compiled with asynchronous device scan enabled to speedup the boot process. This potentially result in the device names changing across reboots. Paraphrased from: https://www.spinics.net/lists/linux-scsi/msg166873.html

2) What convinced to go with my workaround is I had st0/nst0 end up on the second changer previously configured for my VTL. I haven't even tried to get the second changer working with amanda.

DaveAtFraud commented 1 year ago

Better work around: udev rules. I came up with some udev rules that alias the drives in each changer to /dev/st/l[12]st0, /dev/st/l[12]st1, ... and /dev/nst/l[12]nst0, ... and the changers to /dev/tblib1 and /dev/tplib2. The aliases are based on the SCSI ID_SERIAL for both the tape drives and the changers. This makes them persistent and correct across reboots and the random device identifier assignment resulting from asynchronous device initialization. I'll try to find an appropriate public location to post the files. Would still like to see the issue originally documented in this bug get fixed.