Closed TomMelt closed 1 year ago
I guess I don't have permissions to edit issues with labels. If someone could add a UoL / Leicester label I'd appreciate that.
For large input datasets, so far we've been relying on being able to download them from the internet, see for example https://github.com/ukri-excalibur/excalibur-tests/blob/d9dd093296aa7c4a6fd96ec152cb2edba7ffa264/apps/openmm/openmm_rfm.py#L20-L34 in recent PR #115, or https://github.com/ukri-excalibur/excalibur-tests/blob/1d45e360e15e46b09f24a02e011835fa00cda8a5/apps/wrf/wrf.py#L100-L113 in the WRF benchmark.
The issue is, the current location requires signing in and entering a password (University of Edinburgh's DataSync service). I was wondering if Excalibur would host a central server (exposed to the internet) that we could place data like this in. I am not sure of a sensible place to put the input data that is publicly available to everyone.
I will have a chat with the team here in Leicester and see if there's anywhere we can put it for now that doesn't require password access.
I think Zenodo can be an option which doesn't require us to host any infrastructure, at least up to 50 GiB per dataset.
Input and output data storage is not within the scope of the current project. It's definitely something ExCALIBUR should deal with at some point, but not yet. As Mose said, we assume the benchmark code providers would also provide the data in some downloadable location. The solutions you propose make sense. Why would Ramses input data be behind a password anyway?
It doesn't need to be but for some reason (before I joined) it was placed on University of Edinburgh's DataSync service. I want to move it off of that system because it doesn't require password protection but I have nowhere else to put it.
I will look into using zenodo. That will do for now. If we get an ExCALIBUR server in the future we can move it there.
Thanks both
Do anyone already have a plan for centralized input data? I had a look at open/closed issues and couldn't see anything.
The Ramses code requires ~4Gb of inputs.
As I see it we have a couple of options:
wget
files fromFor now I am managing with a manual download of input data but it is not ideal going forward. Is Ramses the only code with this issue?