Open OH-AU opened 1 month ago
Hi, just chiming in that I would love this support as well! Our HPC environment is similarly locked down for internet access for compute nodes.
Hi,
Thank you for bringing up this issue. We will consider this in our planning for future updates. We will reach out once we have developed the offline mode.
Pooja
In case of compute nodes firewalled from the Internet you can't use HTTP(S) URLs as a data source as well, and you can't use SRA accessions and SRA queries for reads. If that's OK, we can move all the data download to the main node before starting the cluster execution. If the main node is also isolated it makes more problems for us to work around.
The above would work for our institute. Many often have a node specifically for the purposes of moving/downloading data, but that node shouldn't be used for any compute. Running a multi-step process would work for me as well e.g. 1) data prep/download 2) analysis etc.
Yes, I'd be totally happy running a two-step process!
Generally I would be having read files locally and would be happy running something which would check and update databases as an initial step, and then running the compute heavy workflow afterwards.
Thank you for considering this, it's a block at the moment for implementing this in my workflow too!
Although I can download most files in advance, it appears that some downloads are hardcoded into the source? Specifically:
Would it be possible to set it up so that downloads could be run independently in advance and a local directory check to see if the files exist before attempting a download as running on a HPC system, the compute nodes themselves are firewalled. Thanks.