holgerBerger / hpc-workspace

Automatically exported from code.google.com/p/hpc-workspace
GNU General Public License v3.0
18 stars 13 forks source link

ws_restore: Restore WS between filesystems #73

Open URZ-HD opened 3 years ago

URZ-HD commented 3 years ago

Hi, we are migrating users from one filesystem (work) to another (gpfs) and set the old system to

allocatable no
extendable no

and make the new filesystem the default one..

But some users have expired workspaces, which they want to restore and migrate to the new filesystem. It looks like "ws_restore" is not able to do this because in every combination the workspace could be found:

> ws_list
id: perftest-beegfs
[...]
     filesystem name      : work
     available extensions : 10
id: acltest
[...]
     filesystem name      : gpfs
     available extensions : 10

> ws_restore -l
work:
hd_qq150-io500-1612449887
        unavailable since Thu Feb  4 15:44:47 2021
gpfs:

> ws_restore hd_qq150-io500-1612449887 acltest
you are human
Error: workspace does not exist.

> ws_restore -F work hd_qq150-io500-1612449887 acltest
you are human
Error: workspace does not exist.

> ws_restore -F gpfs hd_qq150-io500-1612449887 acltest
you are human
Error: workspace does not exist.

The restoration to the existing workspaces "perftest-beegfs", which is on the same filesystem works as expected.

I' not sure if this is a bug or a feature, because restoring between filesystems will be more than only a fast "mv". But in both cases it is a real problem for users.

The only way to solve this was to enable the allocation of "work" temporarily.

holgerBerger commented 3 years ago

You guessed right, it is a mv, and it should be fast. So the two workspaces (the expired one and the new one) have to be in same filesystem. The thinking behind it was if I remember it right that anything else then a mv could last long, and user interrupting it with ctrl-c could leave some pretty strange states. actually it would probably require some kind of multi stage implementation, which could be pretty complex.

URZ-HD commented 3 years ago

Ok, to prevent user interruption is of course a crucial part during the restore process. But maybe the error message could be more clear if user try to restore between Filesystems ?

And additionally ... the admin can configure the "restorable" and "allocatable" parameter seperately. But if allocation is not allowed, a user has no way to restore a workspace successfully (unless an previous ws is still existing on the filesystem).

So maybe some additional checks are useful for the ws_restore command, e.g:

I think these additonal checks are very helpfull if you have more than one or two filesystems in your cluster.