sina-mansour / UKB-connectomics

This repository will host scripts used to map structural and functional brain connectivity matrices for the UK biobank dataset.
https://www.biorxiv.org/content/10.1101/2023.03.10.532036v1
62 stars 7 forks source link

Remote storage options #36

Closed sina-mansour closed 1 year ago

sina-mansour commented 2 years ago

Thus far, more than 10,000 structural connectomes are mapped (in addition to the complete set of atlases and functional connectomes). Our project space on spartan is getting close to 10TB and there is some resistance from HPC admin to increase our storage quota to above that amount. I'm still negotiating with the HPC admin to see if there's a chance for an increased quota (even for the short term).

However, I'm also considering other potential storage options to store the final compressed outputs remotely. Basically, if we're unable to keep the computed data on Spartan we need to store it on a remote location and load it whenever we require further computation/analyses on the data. I've considered the following remote storage options thus far:

1- The NetApp storage at MNC: I've discussed this thoroughly with Chester. We have around 16TB currently available on NetApp which we could use as remote storage. This remote is within the university services and reachable over ssh. According to Chester, this would be the faster option.

2- A mediaflux storage: I've applied for a 20TB MediaFlux storage and this has already been approved. This could be used as a more long-term solution until we figure out how to transfer computed files back to UKB data storage system. I think MediaFlux will have a slower transfer speed, but could probably serve the purpose of a more long-term solution.


I will need to add appropriate scripts to transfer compressed files from/to the netapp storage before/after the main pipeline.

sina-mansour commented 2 years ago

For now, I was able to extend the spartan storage quota to 13TB. Hopefully, this should give us enough space to store everything. I'll wait and run the rest of the pipeline and transfer all files at once after quality checks.