holgerBerger / hpc-workspace

Automatically exported from code.google.com/p/hpc-workspace
GNU General Public License v3.0
18 stars 13 forks source link

Feature request: Workspace archiving command / ws_archive #71

Open cniethammer opened 3 years ago

cniethammer commented 3 years ago

I'd like to see a ws_archive command that provides an easy and efficient way for archiving of a workspace to a long time storage system, e.g. HPSS.

Data movement could be implemented as an archiving deamon, which performs the data movement based on workspace file system and target storage system load.

Ideas for command usage/options

ws_archive targets
# list available archiving targets
# targets are configured by the administrator and
# availability/limits for users/groups may be configured with individual rules

ws_archive push <target:path> <wsname>
# store the given workspace on the target under a provided path
# This command should require data annotation - e.g. entering a category,
# keywords, description - for the data in the workspace to prevent creating
# useless "dark data", 
# The workspace is made unavailable for the user as soon as the archiving
# command is issued to prevent modifications
# after successful transfer to the target, the workspace is released (data
# could be removed immediately or via the ws cleaner)

ws_archive pull  <target:path> <wsname>
# create a new workspace and populate it with the requested data from the target

ws_archive status
# shows progress information for issued push and pull operations
holgerBerger commented 3 years ago

yes, also thinking about that.

there is many questions, would this be a tree? or an archive? should there be some book keeping? e.g. should the workspace DB entry still exist with a pointer to the data and be recoverable? (and ofc not expire)

cniethammer commented 3 years ago

I think the archiving format is likely storage target specific. For HPSS I assume that one wants to use htar or another natively supported method?

I do not see that the workspace tools should do bookkeeping of the archived data in its DB after it is successfully archived, even though it looks tempting. There would be more questions and complications in this case: E.g., a ws storage system may get upgraded - and in this case you do not want to move all this old DB stuff with you. Or, one allows to restore a workspace on another system from a common archive storage, so should this be synced then?

However one could think about a query method that allows the ws tools to get information from the archiving storage about archived workspaces - but this might already go into the direction of data managment services like EUDAT, etc.