replikation / What_the_Phage

WtP: Phage identification via nextflow and docker or singularity
https://mult1fractal.github.io/wtp-documentation/
GNU General Public License v3.0
103 stars 15 forks source link

Run @LSF cluster: PPRMeta #16

Closed hoelzer closed 4 years ago

hoelzer commented 4 years ago

Execution from the nextflow pipeline throws:

Error executing process > 'pprmeta_wf:pprmeta (1)'

Caused by:
  Process `pprmeta_wf:pprmeta (1)` terminated with an error exit status (127)

Command executed:

  cp PPR-Meta/* .
  ./PPR_Meta T7_draft.fa T7_draft.csv

Command exit status:
  127

Command output:
  (empty)

Command error:
  ./PPR_Meta: error while loading shared libraries: libmwlaunchermain.so: cannot open shared object file: No such file or directory

The git/database is cloned correctly.

hoelzer commented 4 years ago

Direct execution via Singularity works:

singularity exec docker://multifractal/ppr-meta:0.1 ./PPR_Meta T7_draft.fa T7_draft.csv
Docker image path: index.docker.io/multifractal/ppr-meta:0.1
Cache folder set to /hps/nobackup2/singularity/mhoelzer/docker
Creating container runtime...
Exploding layer: sha256:35c102085707f703de2d9eaad8752d6fe1b8f02b5d2149f1d8357c9cc7fb7d0a.tar.gz
Exploding layer: sha256:251f5509d51d9e4119d4ffb70d4820f8e2d7dc72ad15df3ebd7cd755539e40fd.tar.gz
Exploding layer: sha256:8e829fe70a46e3ac4334823560e98b257234c23629f19f05460e21a453091e6d.tar.gz
Exploding layer: sha256:6001e1789921cf851f6fb2e5fe05be70f482fe9c2286f66892fe5a3bc404569c.tar.gz
Exploding layer: sha256:d99ed20073595350306b99135321f842d1622b67ee8cdd2cd4eaa413b53eb050.tar.gz
Exploding layer: sha256:524ba74a8b5367065753266de0a665030c36808c345473feaab1ead2e6bb46b2.tar.gz
Exploding layer: sha256:2cc57967173f14678c6ada7b5436464029b75d7348b242dbaeeb71c29505051f.tar.gz
Exploding layer: sha256:3d898dcec34fa2ce3d7c1004335ef14a6610c9e5c13de4f204ae313a6c1ced91.tar.gz

Using TensorFlow backend.
...
finished
hoelzer commented 4 years ago

However, this command is generated by nextflow (.command.run) and executed then:

singularity exec /hps/nobackup2/singularity/mhoelzer/multifractal-ppr-meta-0.1.img /bin/bash -c "/bin/bash -ue /hps/nobackup2/production/metagenomics/mhoelzer/nextflow-work-mhoelzer/90/d1a1ab1e56cb7fb1c28ba4862513fd/.command.sh"

running into

./PPR_Meta: error while loading shared libraries: libmwlaunchermain.so: cannot open shared object file: No such file or directory
hoelzer commented 4 years ago

Modifying this command to

singularity exec docker://multifractal/ppr-meta:0.1 /bin/bash -c "/bin/bash -ue /hps/nobackup2/production/metagenomics/mhoelzer/nextflow-work-mhoelzer/90/d1a1ab1e56cb7fb1c28ba4862513fd/.command.sh"

works. I changed

/hps/nobackup2/singularity/mhoelzer/multifractal-ppr-meta-0.1.img

to

docker://multifractal/ppr-meta:0.1

So the container runtime is freshly created (I think).

hoelzer commented 4 years ago

Ok, executing like this:

singularity exec docker://multifractal/ppr-meta:0.1 /bin/bash -c "/bin/bash -ue /hps/nobackup2/production/metagenomics/mhoelzer/nextflow-work-mhoelzer/90/d1a1ab1e56cb7fb1c28ba4862513fd/.command.sh"

does not write a

/hps/nobackup2/singularity/mhoelzer/multifractal-ppr-meta-0.1.img

file. It seems nextflow somehow forces writing of such a singularity image during runtime. THen this is used and does not work (throws this lib error).

replikation commented 4 years ago

so nextflow is building the wrong singularity container or what is the issue?

hoelzer commented 4 years ago

I try to figure it out. But yes, it seems that nextflow builds a singularity image from the docker files that is somehow defect.

When I just execute the following code on the LSF cluster:

singularity exec docker://multifractal/ppr-meta:0.1 /bin/bash -c "/bin/bash -ue /hps/nobackup2/production/metagenomics/mhoelzer/nextflow-work-mhoelzer/90/d1a1ab1e56cb7fb1c28ba4862513fd/.command.sh"
Docker image path: index.docker.io/multifractal/ppr-meta:0.1
Cache folder set to /hps/nobackup2/singularity/mhoelzer/docker
[7/7] |===================================| 100.0%
Creating container runtime...
Exploding layer: sha256:35c102085707f703de2d9eaad8752d6fe1b8f02b5d2149f1d8357c9cc7fb7d0a.tar.gz
Exploding layer: sha256:251f5509d51d9e4119d4ffb70d4820f8e2d7dc72ad15df3ebd7cd755539e40fd.tar.gz
Exploding layer: sha256:8e829fe70a46e3ac4334823560e98b257234c23629f19f05460e21a453091e6d.tar.gz
Exploding layer: sha256:6001e1789921cf851f6fb2e5fe05be70f482fe9c2286f66892fe5a3bc404569c.tar.gz
Exploding layer: sha256:d99ed20073595350306b99135321f842d1622b67ee8cdd2cd4eaa413b53eb050.tar.gz
Exploding layer: sha256:524ba74a8b5367065753266de0a665030c36808c345473feaab1ead2e6bb46b2.tar.gz
Exploding layer: sha256:2cc57967173f14678c6ada7b5436464029b75d7348b242dbaeeb71c29505051f.tar.gz
Exploding layer: sha256:3d898dcec34fa2ce3d7c1004335ef14a6610c9e5c13de4f204ae313a6c1ced91.tar.gz

7 files are downloaded and the container runtime is built correctly and everything runs smooth.

No file

multifractal-ppr-meta-0.1.img

is created and used. While executing with nextflow, such a file is generated and used and then throws the lib error as it seems.

replikation commented 4 years ago

okay the question is if you can tell nextflow to change the execution type. Sadly not much written here

did you try this option? singularity.autoMounts = true

also check this:

replikation commented 4 years ago

could be this issue, or difference:

Unlike Docker, Nextflow does not mount automatically host paths in the container when using Singularity. It expects they are configure and mounted system wide by the Singularity runtime. If your Singularity installation allows user defined bind points read the Singularity configuration section to learn how to enable Nextflow auto mounts.

replikation commented 4 years ago

could be the reason why its not able to open the Lib file

hoelzer commented 4 years ago

Thx, yeah I was just reading the same web page hoping to find out how Nextflow actually does the conversion of an image from DockerHub to this *.img Singularity file that then will be used. But yeah, not so much information there.

This is what happens when I delete the *.img file, Nextflow automatically pulls it from the DockerHub location:

[06/75ad8d] process > ppr_dependecies:ppr_download_dependencies [100%] 1 of 1 ✔
[90/c1cc9d] process > pprmeta_wf:input_suffix_check (1)         [100%] 1 of 1 ✔
[-        ] process > pprmeta_wf:pprmeta                        -
[-        ] process > pprmeta_wf:filter_PPRmeta                 -
Pulling Singularity image docker://multifractal/ppr-meta:0.1 [cache /hps/nobackup2/singularity/mhoelzer/multifractal-ppr-meta-0.1.img]

and then throws the lib error. I just tried singularity.autoMounts = true -- did not change anything.

hoelzer commented 4 years ago

Was ist das auch fuer eine komische library? Die findet man immer nur im Zusammenhang mit MATLAB

https://www.mathworks.com/matlabcentral/answers/267562-how-can-i-resolve-this-mcc-runtime-error-cannot-open-shared-object-library#answer_211306

hoelzer commented 4 years ago

What I also just noticed: both dockers that make trouble are larger than 5GB... would be strange but maybe that is also an issue. @replikation I remember the problems with the MITOS docker and this one was also huge and I think only ~3GB when you build it again

replikation commented 4 years ago

the large docker issue is currently limited to google genomics because it only has a 10 GB boot disk, and docker container are pulled there. Nextflow is working on "bootdisk" adjustments :) Mitos was 3GB "packed" so it was filling up the boot disk.

I was able to replicate that issue with another docker of metawrap. also quite huge

replikation commented 4 years ago

so in the end it might be better to build singularity container first.?

hoelzer commented 4 years ago

SOLVED

DIR=/hps/nobackup2/singularity/mhoelzer/build
mkdir $DIR/.singularity
mkdir $DIR/.singularity/tmp
mkdir $DIR/.singularity/pull
mkdir $DIR/.singularity/scratch
export SINGULARITY_CACHEDIR=$DIR/.singularity
export SINGULARITY_TMPDIR=$DIR/.singularity/tmp
export SINGULARITY_LOCALCACHEDIR=$DIR/singularity/tmp
export SINGULARITY_PULLFOLDER=$DIR/.singularity/pull
export SINGULARITY_BINDPATH=$DIR/.singularity/scratch

singularity build /hps/nobackup2/singularity/mhoelzer/multifractal-ppr-meta-0.1.img docker://multifractal/ppr-meta:0.1

Now using this img works:

singularity exec /hps/nobackup2/singularity/mhoelzer/multifractal-ppr-meta-0.1.img /bin/bash -c "/bin/bash -ue /hps/nobackup2/production/metagenomics/mhoelzer/nextflow-work-mhoelzer/90/d1a1ab1e56cb7fb1c28ba4862513fd/.command.sh"

even with nextflow:

nextflow run phage.nf --fasta test-data/T7_draft.fa -profile lsf

[28/74d211] process > ppr_dependecies:ppr_download_dependencies [100%] 1 of 1 ✔
[80/adf1af] process > pprmeta_wf:input_suffix_check (1)         [100%] 1 of 1 ✔
[07/715b05] process > pprmeta_wf:pprmeta (1)                    [100%] 1 of 1 ✔
[54/f2290c] process > pprmeta_wf:filter_PPRmeta (1)             [100%] 1 of 1 ✔
Completed at: 05-Nov-2019 13:10:45
Duration    : 1m 6s
CPU hours   : 0.1
Succeeded   : 4

Fazit

Singularity has problems to build an intact image from the DockerHub pull on the LSF cluster environment here. This post was the game changer to get it work: https://github.com/poldracklab/fmriprep/issues/1392

Per default, Singularity uses /tmp during the conversion of the Docker image to a Singularity image. The /tmp folder here on the EBI LSF infrastructure seems to be somehow restricted in size and thus large images are not completely built - without any error message.

replikation commented 4 years ago

so TLDR: you set up your build space somewhere else and it was working?

you should add this as a comment to the singularity config :) nice job

replikation commented 4 years ago

so now you can take a look at the heatmap.. lol

hoelzer commented 4 years ago

It was a long and annoying journey ^^

But exactly, it seems to be the best way to setup some singularity env variables to a directory that has enough space and is writable/readable for all nodes of the cluster. I will try now if this also solves the MARVEL #17 issue and if so I will add the LSF config to the master branch.

replikation commented 4 years ago

perfect. but now nz has full singularity expertise :)

hoelzer commented 4 years ago

Following up to this issue: To get working Singularity images on an LSF cluster (or at least the EBI one) it seems to be sufficient to connect to a node with enough RAM (not only 2GB) and set

export SINGULARITY_TMPDIR=$DIR/.singularity/tmp

to some folder that is read/writeable.

Then execute

singularity build /path/to/singularity/image/multifractal-deepvirfinder-0.1.img docker://multifractal/deepvirfinder:0.1

produces a correct singularity image file.

I have now a small ruby script that gets a nextflow config file as input, extracts all containers, checks if singularity images are available, and if not, builds them like described above. By doing so, I dont have to rely on Nextflow taking care of the Docker>Singularity transversion. Hopefully, this will solve now all my further issues with Nextflow/Docker executions on the EBI/LSF cluster :D

replikation commented 4 years ago

@hoelzer

this sounds a bit like you should ask someone to increase that space :)

Additionally, Nextflow allows you to setup a "tmp" dir to another place. Did you try this? please try this, as you can add this the singularity config file - or similar how we handle the bucket dir for cloud profiles

more details here

Name Description
NXF_TEMP Directory where temporary files are stored
NXF_SINGULARITY_CACHEDIR Directory where remote Singularity images are stored. When using a computing cluster it must be a shared folder accessible from all computing nodes.
hoelzer commented 4 years ago

@replikation how do I add this to the configuration file? simply

NXF_SINGULARITY_CACHEDIR='/this/is/a/path'

?

hoelzer commented 4 years ago

ah, or maybe I already did this by

singularity { 
      ...
    cacheDir = params.cachedir
}
hoelzer commented 4 years ago

@replikation ok with the lsf.config file I added now that holds:

process.executor = 'lsf'
singularity {
    enabled = true
    autoMounts = true
    cacheDir = params.cachedir
}

I can execute the nextflow and let it pull and translate the images from DockerHub to Singularity. This is working now.

It seems that the only problem was that no

cacheDir

was properly set (I dunno how the sysadmins here configured this).

hoelzer commented 4 years ago

Another problem seems to be: when nextflow tries to pull multiple images and translates them into singularity images and somehow breaks, not only the image is fucked up that caused the break but possibly also some others that are incomplete then. Annoying to figure out which ones are corrupted then.

Anyway, adding CacheDir= to the config file seems to help, although I now had to build the image for deepvirfinder again manually after deleting all images again for testing. Still, weird

replikation commented 4 years ago

its now understandable why google genomics is running on docker and not singularity :D