Open thiell opened 1 year ago
Hi @thiell,
For now, the phobos drive del
only deals with the database. We may add this information in the documentation.
We are currently thinking of adding a drive_release
-like feature for the 2.1 version, which is planned for June 2024.
@SebaGougeaud We now think that when stopping phobosd
, the daemon should release its drives, otherwise there is no way multiple phobos instances can properly recover without sysadmin intervention to release the drives. Imagine a scenario with a first data mover dm01 with phobosd, that we stop for maintenance, tapes mounted in the drives. If the daemon does not release the drives when stopping, the other data movers (for example dm[02-03]) will fail trying to grab the tapes previously mounted by dm01, and that will fail both the mounted tapes and the drives on the other data movers dm[02-03].
Please let me know if there is a case the daemon should not release its own drives when stopping... thanks!
@thiell What do you mean by the daemon should "release" its drives ? Do you mean removing any phobos DSS lock ? or do you mean umounting and unloading any tape from any of its drives ? Or any thing else ?
@patlucas: Good question indeed, I mean both phobos DSS lock (lock remaining in the lock
table after phobosd being stopped) and also the LTFS device reservation that can be released with ltfs -o release_device
. That way, after phobosd has been stopped, the cartridge (still in the drive) can be taken over by another data mover / phobosd instance. Otherwise, this leads to a deadlock situation.
I will try to provide relevant logs with the new phobos version (based on current master), but I have some compatibility issues with lhsmtool_phobos / coordinatool right now and can't make it work yet.
As already said, we plan to add an admin command "phobos drive release" to manage the ltfs device reservation. This feature is planned in the phobos 3.0 milestone. We are currently finishing phobos 2.0.
Migration of a drive need an admin command because drives are currently dedicated to a node and this is registered in the DSS.
Migration of a drive from one node to an other will be redesign and taken into account through admin commands in phobos 3.0 .
@patlucas ok no problem for the drives and phobos 3.0, but would you also be releasing the ltfs device reservation when the phobosd daemon stops? For now, we can put a ExecStopPost that would always release ltfs device reservation (otherwise, the tape in the drive cannot be reclaimed by other phobosd).
@patlucas What about the DSS lock release when phobosd is stopped?
For example here we stopped phobosd on elm-ent-dm01
(this is with 1.95.1 not master):
May 1 11:24:54 elm-ent-dm02 phobosd[5081]: 2024-05-01 11:24:54.995937000 <ERROR> Media '054840L9' is locked by (hostname: elm-ent-dm01, owner: 3688211): Operation already in progress (114)
May 1 11:24:54 elm-ent-dm02 phobosd[5081]: 2024-05-01 11:24:54.995954000 <ERROR> Device '/dev/sg5' (S/N '10230057FB') is owned by host elm-ent-dm02 but contains medium '054840L9' which is locked by an other hostname elm-ent-dm01: Operation already in progress (114)
May 1 11:24:54 elm-ent-dm02 phobosd[5081]: 2024-05-01 11:24:54.995961000 <ERROR> Fail to init device '/dev/sg5', stopping corresponding device thread: Operation already in progress (114)
May 1 11:24:54 elm-ent-dm02 phobosd[5081]: 2024-05-01 11:24:54.995980000 <ERROR> setting medium '054840L9' to failed
May 1 11:24:54 elm-ent-dm02 phobosd[5081]: 2024-05-01 11:24:54.998588000 <ERROR> Request failed: PHLK2: Permission denied (13)
May 1 11:24:54 elm-ent-dm02 phobosd[5081]: 2024-05-01 11:24:54.998594000 <ERROR> Error when releasing medium '054840L9' with current lock (hostname elm-ent-dm01, owner 3688211): Permission denied (13)
May 1 11:24:54 elm-ent-dm02 phobosd[5081]: 2024-05-01 11:24:54.998597000 <ERROR> Error when releasing medium 054840L9 after setting it to status failed: Permission denied (13)
May 1 11:24:54 elm-ent-dm02 phobosd[5081]: 2024-05-01 11:24:54.998599000 <ERROR> setting device '10230057FB' to failed
We will indeed try to release the ltfs reservation through a phobos admin command and clean DSS locks.
Awesome, thanks @patlucas, I appreciate your quick answers!
@patlucas ok no problem for the drives and phobos 3.0, but would you also be releasing the ltfs device reservation when the phobosd daemon stops? For now, we can put a ExecStopPost that would always release ltfs device reservation (otherwise, the tape in the drive cannot be reclaimed by other phobosd).
@patlucas What about the DSS lock release when phobosd is stopped?
For example here we stopped phobosd on
elm-ent-dm01
(this is with 1.95.1 not master):May 1 11:24:54 elm-ent-dm02 phobosd[5081]: 2024-05-01 11:24:54.995937000 <ERROR> Media '054840L9' is locked by (hostname: elm-ent-dm01, owner: 3688211): Operation already in progress (114) May 1 11:24:54 elm-ent-dm02 phobosd[5081]: 2024-05-01 11:24:54.995954000 <ERROR> Device '/dev/sg5' (S/N '10230057FB') is owned by host elm-ent-dm02 but contains medium '054840L9' which is locked by an other hostname elm-ent-dm01: Operation already in progress (114) May 1 11:24:54 elm-ent-dm02 phobosd[5081]: 2024-05-01 11:24:54.995961000 <ERROR> Fail to init device '/dev/sg5', stopping corresponding device thread: Operation already in progress (114) May 1 11:24:54 elm-ent-dm02 phobosd[5081]: 2024-05-01 11:24:54.995980000 <ERROR> setting medium '054840L9' to failed May 1 11:24:54 elm-ent-dm02 phobosd[5081]: 2024-05-01 11:24:54.998588000 <ERROR> Request failed: PHLK2: Permission denied (13) May 1 11:24:54 elm-ent-dm02 phobosd[5081]: 2024-05-01 11:24:54.998594000 <ERROR> Error when releasing medium '054840L9' with current lock (hostname elm-ent-dm01, owner 3688211): Permission denied (13) May 1 11:24:54 elm-ent-dm02 phobosd[5081]: 2024-05-01 11:24:54.998597000 <ERROR> Error when releasing medium 054840L9 after setting it to status failed: Permission denied (13) May 1 11:24:54 elm-ent-dm02 phobosd[5081]: 2024-05-01 11:24:54.998599000 <ERROR> setting device '10230057FB' to failed
phobosd should not leave DSS locks on the media it uses unless an error occurred. It would be interesting to see the logs of phobosd when it stops. Either you have an error message that indicates that phobosd did not release the lock or there is a bug.
There was some refactoring of that part of the code. Master is in relatively unstable position right now. The rest of the patches that should fix the bugs is partially integrated and the rest will soon be. Hopefully, by the end of the day everything will be pushed to master. (There is a new health feature that can be configured through the max_health parameter that is coming with it).
We have tested on master and we have noticed that when the phobosd is stopped all the DSS locks are released. Also, all the tapes mounted are unmount with ltfs umount
which release the SCSI reservation on the drive. But the tapes are still loaded into the drive.
However, we have seen that when phobosd crashed, all the DSS lock and all SCSI reservations are still present. Also, we think that when unloading a drive failed, there could be the same problem.
We can add a phobos admin command which releases the SCSI reservation of a drive. Using this command requires the admin to ask several questions: is the tape still mounted ? are the DSS locks on the tape and drive still present ? what is the status of the drive and tape ?
Hi @GauthierEvd,
To mitigate the SCSI reservation issues (when phobosd crashes for example), we added a script that will release all SCSI reservations when the data movers freshly starts. That way, we can let the sysadmin know that if a data mover is rebooted, phobosd should start without problem. Note that we do not move drives between data movers, so we know a data mover will always only release its own drives. And the script might be specific to our configuration as we use SAS tape drives, but it's just to show you how we mitigated the problem. I also added the script as ExecStopPost for normal phobosd shutdown just in case (also I've seen cases where ltfs were still mounted after phobosd stopped or maybe timed out). Not super elegant but this method has worked okay for us so far, when rebooting a data mover or phobosd. I'm sure we can do better though.
[root@elm-ent-dm01 ~]# cat /usr/lib/systemd/system/phobosd.service.d/override.conf
[Unit]
After=phobos_release_device_local.service
[Service]
LimitNOFILE=262144
EnvironmentFile=-/etc/sysconfig/phobos_release_device_local
ExecStopPost=/usr/bin/phobos_release_device_local.py
# workaround: increase start timeout due to TLC serializing all requests
TimeoutStartSec=3600
TimeoutStopSec=900
[root@elm-ent-dm01 ~]# cat /etc/systemd/system/phobos_release_device_local.service
[Unit]
Description=Phobos Device Release
After=network-online.target
[Service]
Type=oneshot
EnvironmentFile=-/etc/sysconfig/phobos_release_device_local
ExecStart=/usr/bin/phobos_release_device_local.py
[Install]
WantedBy=multi-user.target
[root@elm-ent-dm01 ~]# cat /etc/sysconfig/phobos_release_device_local
PHOBOS_DB_HOST="10.4.0.132"
PHOBOS_DB_PORT=5432
PHOBOS_DB_NAME="phobos"
PHOBOS_DB_USER="phobos"
PHOBOS_DB_PASS="<redacted>"
#!/usr/bin/python3
# Stanford Research Computing - Elm storage system
# Written by Stephane Thiell <sthiell@stanford.edu>
#
# Make sure to release local devices before we start Phobos
import argparse
import logging
import os
import os.path
import psycopg2
import socket
from subprocess import Popen, PIPE
import sys
from ClusterShell.Event import EventHandler
from ClusterShell.Task import task_self
from sasutils.sas import SASTapeDevice
from sasutils.sysfs import sysfs
# PostgreSQL Phobos DB
DB_HOST=os.environ["PHOBOS_DB_HOST"]
DB_PORT=os.environ["PHOBOS_DB_PORT"]
DB_NAME=os.environ["PHOBOS_DB_NAME"]
DB_USER=os.environ["PHOBOS_DB_USER"]
DB_PASS=os.environ["PHOBOS_DB_PASS"]
############# End of phobos DB config #############
HOSTNAME = socket.gethostname().split('.')[0]
def db_connect():
return psycopg2.connect(host=DB_HOST,
port=DB_PORT,
dbname=DB_NAME,
user=DB_USER,
password=DB_PASS)
def phobos_drive_list():
drivelist = None
conn = db_connect()
try:
cur = conn.cursor()
try:
cur.execute("select id, path from device where family='tape' and host='%s';" % HOSTNAME)
drivelist = cur.fetchall()
except psycopg2.Error as err:
logging.error(err)
finally:
cur.close()
finally:
conn.close()
return drivelist
class LTFSHandler(EventHandler):
def __init__(self, num_devices):
EventHandler.__init__(self)
self.done = 0
self.num_devices = num_devices
self._promptfmt = '[%d/%d] '
@property
def prompt(self):
return self._promptfmt % (self.done, self.num_devices)
def ev_read(self, worker, node, sname, msg):
print("%s%s: %s" % (self.prompt, node, msg.decode()))
def ev_hup(self, worker, node, rc):
self.done += 1
if rc > 1:
print("%s%s: returned with error code \033[91m%s\033[0m" % (self.prompt, node, rc))
else:
print("%s%s: returned with error code %s" % (self.prompt, node, rc))
def _init_argparser():
parser = argparse.ArgumentParser()
return parser.parse_args()
def main():
"""Entry point for the oak_md_bot script."""
pargs = _init_argparser()
drivelist = phobos_drive_list()
num_devices = 0
if drivelist:
print("Found %d drives:" % len(drivelist))
for driveid, drivepath in drivelist:
print(" %10s at %10s [%s]" % (driveid, drivepath, "OK" if os.path.exists(drivepath) else "PATH NOT FOUND"))
num_devices += 1
task = task_self()
task.set_default("stderr", False) # merge stdout and stderr as ltfsck outputs to stderr
task.set_default("stdout_msgtree", False)
task.set_default("stderr_msgtree", False)
eh = LTFSHandler(num_devices)
for driveid, drivepath in drivelist:
if os.path.exists(drivepath):
realst = os.path.basename(os.path.realpath(drivepath))
tapedev = SASTapeDevice(sysfs.node('class').node('scsi_tape') \
.node(realst).node('device'))
sg_name = tapedev.scsi_device.scsi_generic.sg_name
task.shell("ltfs -o devname=/dev/%s -o release_device" % sg_name,
key="%s(%s)" % (driveid, sg_name),
handler=eh)
task.run()
else:
print("No drives found! Aborting.")
if __name__ == '__main__':
main()
Thanks Stéphane for all these details.
We are adding a "phobos drive release" command to the phobos cli to execute the "ltfs -o release_device" action. The corresponding patch is currently in review.
When we try to reproduce your problem, we see that when the phobosd crashes (without stopping correctly and without umounting its loaded tapes), we need to manually remove the "ltfs lock" to allow an other host to use the drive. But we don't test if this ltfs lock also block the same phobosd host to restart and reuse this drive. We will test it. If it is the case, we will see if we need to integrate this "ltfs release lock" to the start of the phobosd daemon (as we already currently manage existing phobos lock into the DSS when a phobosd starts).
One detail to finish, into your service script, do not hesitate to use as much possible integrated phobos commands instead of looking to the phobos DSS . For example, to list existing drive, you can use the "phobos drive list" command instead of requesting the DSS.
Really minor but reporting just to not forget: our LTO-9 drives are accessible from multiple hosts, and when deleting a drive with
phobos drive del ...
from a host and adding it to another withphobos drive add ...
, then this drive won't work and LTFS complains about an existing SCSI reservation.When phobosd is trying to use the drive from the other server, we can see errors like that:
Especially this one I guess:
A solution is to release the SCSI reservation on the original server with the following command:
After that, the drive can be used from the other server by phobos.
Perhaps
phobos drive del
could do that automatically? Or a note in the documentation about that would be less confusing.