This is the last PR of the rewrite. It adds two task which are performed only when a node is idle (has no I/O tasks in the queue) during an update loop:
On nearline nodes, check the HSM state of files and update the database. Because the HSM state can be changed outside of alpenhorn, this is necessary to keep the DB in line with what's actually going on.
On any node that enables it, automatically re-verify files to ensure they aren't corrupt.
The QueryWalker
This PR introduces a QueryWalker in querywalker.py which facilitates both these tasks: it takes a table name (model) and (optionally) a bunch of condition expressions and walks through the matched rows every time get() is called. The query starts at a random place in the results. This is done that so short runs of alpenhorn won't always end up running over the same set of records.
Config
For the HSM state check, the number of items to do at a time is given in the nearline I/O config.
For the re-verify check, this PR add a new field to the StorageNode model: StorageNode.auto_verify which is the number of files to auto-verify per idle update loop. If this field is zero (the default), then auto-verification is not done.
Note
Originally, this was just the HSM update thing, but then I realised the query-walker could also be used to implement this auto-re-verification, which I've been thinking about doing for a while, even though it's not really relevant to the rest of this rewrite
This is the last PR of the rewrite. It adds two task which are performed only when a node is idle (has no I/O tasks in the queue) during an update loop:
The QueryWalker
This PR introduces a
QueryWalker
inquerywalker.py
which facilitates both these tasks: it takes a table name (model
) and (optionally) a bunch of condition expressions and walks through the matched rows every timeget()
is called. The query starts at a random place in the results. This is done that so short runs of alpenhorn won't always end up running over the same set of records.Config
For the HSM state check, the number of items to do at a time is given in the nearline I/O config.
For the re-verify check, this PR add a new field to the
StorageNode
model:StorageNode.auto_verify
which is the number of files to auto-verify per idle update loop. If this field is zero (the default), then auto-verification is not done.Note
Originally, this was just the HSM update thing, but then I realised the query-walker could also be used to implement this auto-re-verification, which I've been thinking about doing for a while, even though it's not really relevant to the rest of this rewrite
Issues closed by the rewrite as a whole
Closes #75 Closes #93 Closes #136 Closes #140