rnajena / poseidon

PoSeiDon: positive selection detection and recombination analysis pipeline
MIT License
35 stars 10 forks source link

Installing and running through singularity #35

Open rresendepinto opened 2 years ago

rresendepinto commented 2 years ago

I understand how to install and run this program with docker but how can I install and run the software without using docker on a HPC cluster which uses singularity?

hoelzer commented 2 years ago

Hey @rresendepinto thx for your interest! This should be fairly simple: -profile slurm,singularity in case your HPC runs SLURM.

Please check also the README! When running on a HPC, I also recommend using --cachedir to store the Singularity images and --workdir or -w to stare the work directories. Can be dangerous on HPC to store such stuff in your home for example w/ limited space.

rresendepinto commented 2 years ago

Thank you for the quick answer! What if the HPC doesnt run SLURM, but SGE instead?

hoelzer commented 2 years ago

Uh I think SGE is not implemented yet but only SLURM and LSF. However, SGE is supported by nextflow: https://www.nextflow.io/docs/latest/executor.html

We can provide a hot fix and then you can give it a try. Unfortunately, we can not test this bc/ no access to a SGE.

@fischer-hub can you add a -profile sge to the nextflow.config please?

rresendepinto commented 2 years ago

If you could do that, it would help me a lot. Thank you!

fischer-hub commented 2 years ago

Hey @rresendepinto ! I just added the SGE profile to the workflow on a new branch sge_profile! If you could checkout the branch and tell me if everything is running as you'd expect that would be great, since we don't have access to any machines running SGE as @hoelzer said.

You can activate the SGE profile by adding -profile sge,singularity, -profile sge,conda or -profile sge,docker to your nextflow call depending on which container engine you would like to use!

hoelzer commented 2 years ago

Thx @fischer-hub for the fast fix!

@rresendepinto you can also pull and run the new code on the branch easily via

nextflow pull hoelzer/poseidon
nextflow run hoelzer/poseidon -r sge_profile ...
rresendepinto commented 2 years ago

Hi! I tried the new version but I think there was a conflict with SGE.

Error executing process > 'check_fasta_format (1)'

Caused by:
  Failed to submit process to grid scheduler for execution

Command executed:

  qsub -terse .command.run

Command exit status:
  1

Command output:
  Unable to run job: denied: host "compute-0-12.local" is not a submit host
  Exiting.

The system is CentOS 7 and the SGE version is 8.1.9.

fischer-hub commented 2 years ago

Hi @rresendepinto , sadly I can not test this but from your command output I'm wondering wether you are executing the pipeline from a compute node:

Unable to run job: denied: host "compute-0-12.local" is not a submit host

In that case could you try running the pipeline from a node where the qsub command is available? (e.g. the clusters head node) The nextflow documentation mentions this for SGE:

Nextflow manages each process as a separate grid job that is submitted to the cluster by using the qsub command.

Being so, the pipeline must be launched from a node where the qsub command is available, that is, in a common usage scenario, the cluster head node.

Nextflow will then submit the individual processes to the compute nodes itself :)

rresendepinto commented 2 years ago

It just hangs indefinitely in the clusters head node. I can't figure out why.

fischer-hub commented 2 years ago

Hi! Could you post the command you used and maybe the .nextflow.log file? At what point does the pipeline stop running?

rresendepinto commented 2 years ago

Command: NXF_JAVA_HOME=/home/rpinto/Documents/bin/jre1.8.0_321 NXF_VER=20.09.0-edge nextflow run hoelzer/poseidon -r sge_profile --fasta poseidon/test_data/bats_mx1_small.fasta --cores 4 -profile sge,singularity >> qsub2.out &

Nextflow log file nextflow_0703.log

This is the output:

Tree root species: NA Reference species: NA

Use KH-insignificant breakpoints: no

[- ] process > check_fasta_format - [- ] process > translatorx - [- ] process > check_aln -

[- ] process > check_fasta_format - [- ] process > translatorx - [- ] process > check_aln - [- ] process > remove_gaps - [- ] process > raxml_nt - [- ] process > raxml_aa - [- ] process > raxml2drawing - [- ] process > nw_display - [- ] process > barefoot - [- ] process > model_selection - [- ] process > gard_detect - ____ Execution status: failed Results are reported here: results//html/full_aln/index.html

No results folder is created

fischer-hub commented 2 years ago

Thanks @rresendepinto ! Your command seems fine to me, works as expected (with slurm). However I noticed you are running Nextflow version 20.09.0-edge, you can update your Nextflow installation to version 21.10.6 with nextflow self-update. Maybe that will already fix the issue. Otherwise I'm not quite sure whats going on, did you successfully run other nextflow pipelines in the past already? Also, is there anything interesting in the qsub.out log files? Btw, do you see any of the singularity images being downloaded?

rresendepinto commented 2 years ago

This is the first nextflow pipeline I am trying to run on this cluster. The qsub file only presents what I reported before as output. The same thing happens with version 21.10.6. The singularity images are being downloaded

fischer-hub commented 2 years ago

Alright, you could try to run the nextflow test pipeline, that way we can see if this issue is restricted to poseidon or an issue with the HPC and the nextflow execution:

nextflow run hello

You can expect an output similar to this if everything is working correctly:

N E X T F L O W  ~  version 21.10.6
Launching `nextflow-io/hello` [nice_payne] - revision: ec11eb0ec7 [master]
executor >  local (4)
[fa/20fe7a] process > sayHello (1) [100%] 4 of 4 ✔
Ciao world!

Hello world!

Hola world!

Bonjour world!

Also I found this issue with nextflow on SGE HPCs which kind of sounds like your issue. The problem here was that SGE defaulted to a shell that was not bash, resulting in nextflow crashing. I added their recommended fix to the SGE profile in poseidon, so you could also try pulling the last commit again and running your command. Do you know what kind of shell your HPC is running as default?

rresendepinto commented 2 years ago

The hello program works. The HPC runs bash as default.

The pipeline doesn't hang anymore but throws a different error related to the job scheduler:

qsub4.out.txt

fischer-hub commented 2 years ago

Okay, great! That means nextflow is running now. From the qsub log file it seems your cluster requires the definition of the parallel environment to use. I added smp as the standard parallel environment to use with the SGE profile here, however it is possible that your cluster is using a different parallel environment. If you are still getting an error similar to

Command output:
  Unable to run job: "job" denied: use parallel environments instead of requesting slots explicitly
  Exiting.

that is referring to the parallel environment you might have to change the penv variable here (in your local installation) from smp to the environment your cluster is using, e.g. something like mpi. But you can just try to run the new commit maybe your cluster is using smp!

rresendepinto commented 2 years ago

I think it uses mpi. Where is the pipeline code stored?

rresendepinto commented 2 years ago

I changed to mpi but it won't run because it has uncommited changes.

fischer-hub commented 2 years ago

I changed to mpi but it won't run because it has uncommited changes.

Maybe you are trying to run the pipeline from this branch here. Can you try to change to your local directory where your poseidon installation is (I think in your log file it was /home/rpinto/.nextflow/assets/hoelzer/poseidon/) and then start the pipeline with:

nextflow run poseidon.nf --fasta poseidon/test_data/bats_mx1_small.fasta --cores 4 -profile sge,singularity

nextflow run hoelzer/poseidon -r sge_profile ... will try to run the sge_profile branch but you actually want to run with your loal changes.

hoelzer commented 2 years ago

Yes exactly, or first clone your own local copy of the PoSeiDon code, make the changes, and then run like described above via nextflow run poseidon.nf ...

git clone https://github.com/hoelzer/poseidon.git
cd poseidon
git checkout sge_profile
# now you are on the code branch w/ the changes David introduced
git pull origin sge_profile
# just in case, check that you really have the changes in this branch
# now modify your local copy according to your needs
nextflow run poseidon.nf --fasta poseidon/test_data/bats_mx1_small.fasta --cores 4 -profile sge,singularity
rresendepinto commented 2 years ago

I ran the pipeline from the local directory and it seems to have solved the issues with the job scheduler. However, it threw an error on the "gard_detect" step. The output is in the following file. I am running the pipeline on the test_data/bats_mx1_small.fasta file.

qsub7.out.txt

fischer-hub commented 2 years ago

Glad you got it to run, yes there are a few known issues regarding the gard_detect process, there already is a PR addressing them, however it does not contain the SGE profile changes. Could you provide the .command.log and gard.log files from the working directory so we can figure out if its a known issue? (/home/rpinto/.nextflow/assets/hoelzer/poseidon/work/84/428977a49ec560ffb8bdddfe7eb1b1)

If its the same issue I can just rebase the sge_profile branch so the fix is on there too.

rresendepinto commented 2 years ago

.command.log is empty

gard.log

fischer-hub commented 2 years ago

.command.log is empty

gard.log

Yep, thats the issue targeted in the gard PR. I rebased the sge_profile branch so the change should be available there too now. Just pull and try again! And don't forget to set your penv in case it got overwritten because of the pull :)

rresendepinto commented 2 years ago

I can not pull the pipeline again.

hoelzer/poseidon contains uncommitted changes -- cannot pull from repository

I tried changing the penv variable back to 'smp' but it didn't work

fischer-hub commented 2 years ago

Can you try:

git clone https://github.com/hoelzer/poseidon.git
cd poseidon
git checkout sge_profile
# now you are on the code branch w/ the changes David introduced
git pull origin sge_profile
# just in case, check that you really have the changes in this branch
# now modify your local copy according to your needs
nextflow run poseidon.nf --fasta test_data/bats_mx1_small.fasta --cores 4 -profile sge,singularity

that way you pull the changes directly via Git.

rresendepinto commented 2 years ago

It finally worked! I can't wait to use this pipeline on my data.

Thank you so much for your help! :)

fischer-hub commented 2 years ago

Great! I'm glad to hear that,

thanks for also testing the SGE profile!

hoelzer commented 2 years ago

Awesome! Thanks for your patience @rresendepinto and thanks for the support @fischer-hub !