Closed ketiltrout closed 1 year ago
I made ArchiveFile.archive_count
and StorageNode.under_min
into properties. I've added verbs to the other parameter-free methods, but I'm willing to revert their names and make them properties, too, if that seems better.
Also, I'm now realising the contents of this PR would probably have been better introduced when they were used in the I/O rewrite, but unlikely to be worth changing now.
I pushed the timestamp update into the schema. Downside is it only works in MySQL. Upside is don't have to re-implement ArchiveFileCopy.update
, I guess.
This PR updates the peewee table models for the data index (
StorageGroup
,StorageNode
,ArchiveAcq
,ArchiveFile
,ArchiveFileCopy
,ArchiveFileCopyRequest
). It does not deal withAcqType
orFileType
, which will be handled in a later PR.This PR updates the field lists for these tables (adding new fields and removing unused fields). All field updates are backwards compatible with alpenhorn-1. It also add a few convenience methods for various common database queries related to the tables. The methods are mostly database queries that were performed explicitly within the I/O code. Moving these queries to methods here is to reduce the complexity of the I/O code itself.
StorageGroup
andStorageNode
will have a subsequent update to add the new I/O framework, but the other table models (ArchiveAcq
,ArchiveFile
,ArchiveFileCopy
,ArchiveFileCopyRequest
) have no further changes pending.StorageGroup
Fairly light changes:
name
field as uniquecopy_state
method which takes anArchiveFile
and returns the value ofhas_file
(Y/M/X/N) for copy/copies of theArchiveFile
in this group.StorageNode
A more extensive update.
StorageNode fields
name
is marked uniquesuspect
is dropped. It was never used or implemented properly.storage_type
, but I have updated its description in the docstring to better indicate what 'F' means (i.e. it's anything that's not an archive and not transiting storage). This closes #49 .max_total_gb
allows null/None (for nodes which can't determine their maximum size). The unusual default of -1 is dropped. The code retains support for using -1 to indicate an unknownmax_total_gb
, but will never set the value to -1 (it will use null/None instead).min_avail_gb
default explicitly set to zero.min_delete_age_days
is dropped. It has never been used.StorageNode methods and properties
local
: a boolean property which is True when this StorageNode is local to the running alpenhorn (whether or not it is active).archive
: a boolean property which is True when the StorageNode is an archive node (whenstorage_type=='A'
)under_min
: Returns False ifavail_gb
is None, otherwise returnsmin_avail_gb > avail_gb
.over_max
: Returns False ifmax_total_gb
is None or non-positive, or else a boolean: sum of allArchiveFile.size_b
on this node (seetotal_gb
below) is greater thanmax_total_gb
.named_copy_present
given an acq name and a file name, returns True if a copy of the file in the acq exists on this node.total_gb
sums upArchiveFile.size_b
for all files on this node and returns the result in GiB.all_files
returns a list of paths (relative toroot
) for all files existing on this node.ArchiveAcq
Only change:
name
is marked unique.ArchiveFile
Field changes:
name
,acq
) is unique. This constraint is already present in the CHIMEDB production database. We may have added it by hand there.Properties and methods:
path
the path to this file, i.e. the path concatenation ofacq.name
andname
.archive_count
returns the number of archive nodes containing a copy of this file.ArchiveFileCopy
Fields:
ready
, a boolean which might indicate that a file is ready for I/O. This field is used to communicate between hosts which files are available for pulls, but, as the docstring indicates, shouldn't ever be consulted directly, because different I/O types can have different interpretations of the meaning of this field. (In James's work, we called this "prepared").size_b
to be null/None (for instances where the size-on-node can't be determined).last_update
a timestamp that is automatically updated whenever theArchiveFileCopy
record is changed.Properties and Methods:
last_updated
, thesave
andupdate
methods are re-implemented to add the newlast_udpated
value before handing things off to peewee. This could also be done at the DB level: MySQL allows anON UPDATE CURRENT_TIMESTAMP
clause in the table schema, but that isn't supported by peewee.path
a property returning an absolutepathlib.Path
to the file copy.ArchiveFileCopyRequest
Field changes:
nice
is dropped. Has never been used.completed
has a default value of False added, which I think is the default of a non-null boolean field if none is given, but it's better to be explicit.timestamp
is given the current time as a creation-time defaultOther changes
Because it's used in the
StorageNode.local
property, this is a convenient place to update the hostname-finding logic: The functionalpenhorn.util.get_short_hostname
is renamed toalpenhorn.util.get_hostname
and now supports specifying the hostname explicitly in thealpenhorn
YAML config, giving us the flexibility to put logical values inStorageNode.host
(like, say, "scinet"), instead of being forced to use whatever value the actual name of the host we're running on has (like, say, "nia-dm1").The unit tests for (at least the updated parts of)
alpenhorn.acqusition
,alpenhorn.archive
, andalpenhorn.storage
have been restored/updated after being disabled in #144. (They were also renamed to drop_model
from the name of the test files, because I like testingalpenhorn/<name>.py
intests/test_<name>.py
).A large set of DB-data-producing fixtures has been added to
conftest.py
. These will be used extensively by the I/O framework unit tests.