icbi-lab / nextNEOpi

nextNEOpi: a comprehensive pipeline for computational neoantigen prediction
Other
67 stars 24 forks source link

Can run on sensitive data cluster without internet access? #37

Closed richelbilderbeek closed 1 year ago

richelbilderbeek commented 1 year ago

Dear nextNEOpi maintainer(s),

Thanks for nextNEOpi, the help (e.g. #36 ) and the documentation!

Regarding the documentation, at the usage section there is mention of an HPC cluster profile. However, as far as I can see, such a profile will download the Singularity containers and/or conda packages. This may be a problem for HPC clusters that have no internet access (due to sensitive data and data protection laws) except for a file transfer folder.

Can nextNEOpi run on an HPC cluster that has no internet access? Or which ways you'd recommend to get it to do so?

Thanks and cheers, Richel

(note to self: nf-core pipelines do have this feature, see https://nf-co.re/docs/usage/offline#pipeline-code)

(another note to self, here a reply from Maxime Garcia:

I'm not familiar with said pipeline, and it's a DSL1, so you'd have to use an older Nextflow version. We do have some docs on how to run nf-core pipelines offlines, and I did run sarek quite often on bianca myself, so it should be possible, but as this is not an nf-core pipeline, and still in DSL1, you'll need to get the containers yourself beforehand and transfer them to bianca via wharf

)

riederd commented 1 year ago

In principle I should be possible to run the pipeline offline, one would need to download the singularity images manually and put them to the $NXF_SINGULARITY_CACHEDIR on the HPC cluster. Then make sure that the $NXF_SINGULARITY_CACHEDIR env variable is set. You'd also need to install the VEP databases, IEDB, mixcr (if needed), GATK3 (if needed), mixmhc2pred and IGS manually.

mehc555 commented 1 year ago

I'm also trying to run this pipeline on a cluster that has worker nodes which do not have internet access. During the pvacseq step, it tries to connect to a webserver:

Running NetMHCStabPan

Command error: During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/opt/conda/lib/python3.8/site-packages/requests/adapters.py", line 439, in send resp = conn.urlopen( File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 726, in urlopen retries = retries.increment( File "/opt/conda/lib/python3.8/site-packages/urllib3/util/retry.py", line 446, in increment raise MaxRetryError(_pool, url, error or ResponseError(cause)) urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='services.healthtech.dtu.dk', port=443): Max retries exceeded with url: /cgi-bin/webface2.cgi (Caused by ConnectTimeoutError(<urllib3.connection.HTTPS

Any ideas for ways around this?

Thank you

riederd commented 1 year ago

In the README you can find the option to turn off/on netMHCstab

--use_NetMHCstab Use NetMHCstab to predict the stability of peptide binding to MHC molecules Default: true

So --use_NetMHCstab false will turn it off

HTH