seqeralabs / nf-tower

Nextflow Tower system
https://tower.nf
Mozilla Public License 2.0
144 stars 51 forks source link

Feature Request: Support Remote Agent #331

Open apeltzer opened 2 years ago

apeltzer commented 2 years ago

Hi!

a potentially interesting feature would also be the possibility to have a "local executor", e.g. a plain machine with required tools / software installed to run the workflow via tower on. In some cases, isolated workstations with e.g. proprietary tools are a thing that is not too easy to change / modify.

It would however be great to use these from Tower and not having to setup a "single machine SLURM" for example.

SSH credentials are already in Tower, dependency handling could be done the same way as with other tools (e.g. -profile docker).

The executor could simply be a "Remote SSH Executor" that runs a job on the machine of choice. Setup of that could also be done in Tower. Could also be that this is more a Core-Nextflow thing, e.g. a "Nextflow SSH Remote Executor" would be the best way forward 👍🏻

pditommaso commented 2 years ago

The need for a single node executor is a recurrent request. Not sure we are allowing this via SSH, however we are discussing a similar ability via the new agent tool we are developing.

apeltzer commented 2 years ago

Thank you Paolo - would be great to see something like this available 👍🏻

apeltzer commented 2 years ago

Is there some news on the agent tool you were mentioning Paolo? I have some testcases here at hand and could / would give this a go 💯

ewels commented 2 years ago

Tower Agent is up and running, available here: https://github.com/seqeralabs/tower-agent

jordeu commented 2 years ago

@apeltzer notice that, even if Tower Agent can be used in a future as a connection gateway, right now at Tower side you can only use it to submit jobs to an HPC scheduler (slurm, lsf...).

As a temporary workaround if you want to use in a workstation you can fake some Slurm commands to use it:

sinfo

#!/bin/bash
echo "slurm 16.05.3"

scancel

#!/bin/bash
kill $1
echo "done"

squeue

#!/bin/bash
ps -Af | grep ".launcher.sh" | grep -v grep | awk '{print($2" R")}'

sbatch

#!/bin/bash
bash $1 &>/dev/null &
echo "Submitted batch job $!"
apeltzer commented 2 years ago

I cannot really set up the SLURM execution environment this way unfortunately :-( Always getting connection issues, though the key is definitely there

pditommaso commented 2 years ago

We could try to investigate the problem with the agent, if you are some error message or log please report it here.

apeltzer commented 2 years ago

So I need to have a Tower Version that can work with the Tower Agent, the "fake Slurm" part above on the machine running the jobs and then should be able to run things, correct?

Might also mean that I need IT to update Tower first to get this running.

jordeu commented 2 years ago

So I need to have a Tower Version that can work with the Tower Agent, the "fake Slurm" part above on the machine running the jobs and then should be able to run things, correct?

Yes. Meanwhile maybe you can test the setup using https://tower.nf.