Open kousu opened 3 years ago
I talked with Pierre Bellec yesterday, we might have additional options for temporary hosting:
I am unsure how the tape storage works, but looking around their docs https://docs.computecanada.ca/wiki/Using_nearline_storage explains that all their servers have a mountpoint /nearline
which is a large disk that's backed by nightly archives to tape. I'd have to get in and see how it actually looks, but hopefully it is relatively simple to use.
They want us to store large files there, which means we need to put whatever we get download into a .tar file, or multiple .tar files, before writing to that disk. So it might be a little complicated.
@kousu did you manage to check the downloaded files on CC ?
@kousu did you manage to check the downloaded files on CC ?
over here: https://github.com/neuropoly/data-management/pull/105#issuecomment-898637991
Our download access to https://biobank.ctsu.ox.ac.uk/ is ending on 2021-08-18.. We need to archive as much as possible to our internal servers before that date.
Their download docs are https://biobank.ctsu.ox.ac.uk/~bbdatan/Accessing_UKB_data_v2.3.pdf. We have a license keyfile on
smb://duke/<TODO>
They have three programs (because they invented their own API, what I want to avoid for #77) to do the download:
ukblink
, source code -- but this isn't really the source code, they distribute a pre-compiled static.a
file and the "source" here is just a wrapper to make the linker happy.ukbfetch
, source code -- dittogfetch
, source code -- dittoWe don't need the entire dataset, but a subset of images, metadata fields, and subjects.
The dataset is estimated to be 38TB, so we need more storage space.
data.neuro.polymtl.ca
only has 1TB.