Closed sina-mansour closed 1 year ago
We're currently storing all intermediary files on the scratch file system. The following processed data are stored/will be stored to be shared with the public:
We'll need to ensure that we can somehow upload three sets of bulk data for every individual back to the UKB storage:
@caioseguin would you be able to enquire from UKbiobank to see if they will accept that and whether there are certain limits that we need to adhere to?
I will ask them and get back to you.
This issue has been left dormant for a while. The last update is that we were able to return the results to the UK biobank over a secure sFTP connection (using MediaFlux).
UKB has informed us that the resource should be made available in a new release (planned for November 2023).
Following on the suggestion by @Lestropie (this commit):
RS: As per discussion, need to find out how much data can be uploaded per subject to UKB (and indeed what volume of data could potentially be hosted elsewhere). Any temporaries that are not to be later hosted anywhere are better off being stored on a RAM file system. My typical approach here is to load all input data into a scratch directory that I can force to be in /tmp/, store all intermediate files and final outputs there, and only upon script completion do I then write the desired derivatives to the location requested by the user. I then only retain the scratch directory if the user explicitly requests that it be retained. Your structure here checks for the pre-existence of calculated files, which is useful when you are testing perturbations to the script, but for final deployment this ability is not as high a priority.