whole-tale / terraform_deployment

Terraform deployment setup for WT prod
BSD 3-Clause "New" or "Revised" License
3 stars 2 forks source link

Keep wt_data_manager private storage in a volume #37

Closed Xarthisius closed 5 years ago

Xarthisius commented 5 years ago

WT Data Manger plugin keeps transferred files in a set of folders:

I think it would be better to keep them persistent at least between container restarts (-> volume). @hategan do they need to be backed up in order to keep consistency with girder's db?

hategan commented 5 years ago

PS: it depends on how DMS is used. If we lock files when we publish a tale and keep them locked, that's a guarantee that the file is in the PS. In that case, we should persist PS. Btw, '/tmp/ps' is, in a sense, a bad default. It suggests that the directory is temporary. That's not really what's intended. The reason that's the default is because '/tmp' being writeable is a safe assumption.

Globus Root Path: This is where the WT user endpoint directories are. In practice, it's only used as a staging area when copying files form external Globus resources. Once the files are copied, they get moved from Globus Root Path to PS. So this is truly a temporary directory. That said, the code may assume that once a user endpoint is created, its root dir will be in Globus Root Path, so things may break with the current code if that's not true. That's easily fixable though.

Globus Connect Directory: you need an unpacked Globus Connect Personal Server here. This is not a data directory, but a "library" dependency.

hategan commented 5 years ago

Maybe I should mention here that there's a performance benefit to having Globus Root Path on the same FS as PS. That's because of the move operation from the former to the latter, which, should the two be on differens FSes, would involve actual data movement.

Xarthisius commented 5 years ago

@hategan thanks, we're going to make sure that's the case on dev and prod.

On the side note: in case PS storage is wiped out by accident and it is no longer in sync with the database: is it sufficient to close all sessions and find all items that contain dm entry and drop it to "repair" everything?

hategan commented 5 years ago

Yes, that should work. All the info WRT caching is in .dm. We should probably have that as a manual operation since it shouldn't occur in normal use and it is time consuming. If you think an API call is beneficial, I can probably write that. That said, I imagine doing it directly on the DB to be a reasonable option given that, again, this shouldn't occur with normal use and it's sufficiently simple to do. But then it should be documented somewhere.