UKB storage to upload processed bulk data

sina-mansour commented 2 years ago

Following on the suggestion by @Lestropie (this commit):

RS: As per discussion, need to find out how much data can be uploaded per subject to UKB (and indeed what volume of data could potentially be hosted elsewhere). Any temporaries that are not to be later hosted anywhere are better off being stored on a RAM file system. My typical approach here is to load all input data into a scratch directory that I can force to be in /tmp/, store all intermediate files and final outputs there, and only upon script completion do I then write the desired derivatives to the location requested by the user. I then only retain the scratch directory if the user explicitly requests that it be retained. Your structure here checks for the pre-existence of calculated files, which is useful when you are testing perturbations to the script, but for final deployment this ability is not as high a priority.

sina-mansour commented 2 years ago

We're currently storing all intermediary files on the scratch file system. The following processed data are stored/will be stored to be shared with the public:

Native atlases:
- Surface atlases registered to the native space are available as nifti volumetric hard parcellations and will be released as such
- These atlases include 20 from Schaefer et al. as well as the HCP MMP1.0 atlas
Functional time-series:
- We have provide the resting state time-series for all of the atlases (we decided to provide the time-series rather than correlation connectivity measure as this would increase the possible usecases)
- We have also provided a global signal time-series for studies aiming to apply global signal regression
Structural connectivity:
- We will provide high-resolution endpoints in native/MNI for all mapped streamlines (only the ends of tractograms, to reduce size)
- We will also provide the following a wide range of connectivity measures (streamline count, FBC, density, length, etc.) mapped to different atlases.

We'll need to ensure that we can somehow upload three sets of bulk data for every individual back to the UKB storage:

atlases
functional time-series
structural connectivity measures

@caioseguin would you be able to enquire from UKbiobank to see if they will accept that and whether there are certain limits that we need to adhere to?

caioseguin commented 2 years ago

I will ask them and get back to you.

sina-mansour commented 1 year ago

This issue has been left dormant for a while. The last update is that we were able to return the results to the UK biobank over a secure sFTP connection (using MediaFlux).

UKB has informed us that the resource should be made available in a new release (planned for November 2023).

sina-mansour / UKB-connectomics

UKB storage to upload processed bulk data #11