Closed hoelzer closed 4 years ago
Direct execution via Singularity works:
singularity exec docker://multifractal/ppr-meta:0.1 ./PPR_Meta T7_draft.fa T7_draft.csv
Docker image path: index.docker.io/multifractal/ppr-meta:0.1
Cache folder set to /hps/nobackup2/singularity/mhoelzer/docker
Creating container runtime...
Exploding layer: sha256:35c102085707f703de2d9eaad8752d6fe1b8f02b5d2149f1d8357c9cc7fb7d0a.tar.gz
Exploding layer: sha256:251f5509d51d9e4119d4ffb70d4820f8e2d7dc72ad15df3ebd7cd755539e40fd.tar.gz
Exploding layer: sha256:8e829fe70a46e3ac4334823560e98b257234c23629f19f05460e21a453091e6d.tar.gz
Exploding layer: sha256:6001e1789921cf851f6fb2e5fe05be70f482fe9c2286f66892fe5a3bc404569c.tar.gz
Exploding layer: sha256:d99ed20073595350306b99135321f842d1622b67ee8cdd2cd4eaa413b53eb050.tar.gz
Exploding layer: sha256:524ba74a8b5367065753266de0a665030c36808c345473feaab1ead2e6bb46b2.tar.gz
Exploding layer: sha256:2cc57967173f14678c6ada7b5436464029b75d7348b242dbaeeb71c29505051f.tar.gz
Exploding layer: sha256:3d898dcec34fa2ce3d7c1004335ef14a6610c9e5c13de4f204ae313a6c1ced91.tar.gz
Using TensorFlow backend.
...
finished
However, this command is generated by nextflow (.command.run
) and executed then:
singularity exec /hps/nobackup2/singularity/mhoelzer/multifractal-ppr-meta-0.1.img /bin/bash -c "/bin/bash -ue /hps/nobackup2/production/metagenomics/mhoelzer/nextflow-work-mhoelzer/90/d1a1ab1e56cb7fb1c28ba4862513fd/.command.sh"
running into
./PPR_Meta: error while loading shared libraries: libmwlaunchermain.so: cannot open shared object file: No such file or directory
Modifying this command to
singularity exec docker://multifractal/ppr-meta:0.1 /bin/bash -c "/bin/bash -ue /hps/nobackup2/production/metagenomics/mhoelzer/nextflow-work-mhoelzer/90/d1a1ab1e56cb7fb1c28ba4862513fd/.command.sh"
works. I changed
/hps/nobackup2/singularity/mhoelzer/multifractal-ppr-meta-0.1.img
to
docker://multifractal/ppr-meta:0.1
So the container runtime is freshly created (I think).
Ok, executing like this:
singularity exec docker://multifractal/ppr-meta:0.1 /bin/bash -c "/bin/bash -ue /hps/nobackup2/production/metagenomics/mhoelzer/nextflow-work-mhoelzer/90/d1a1ab1e56cb7fb1c28ba4862513fd/.command.sh"
does not write a
/hps/nobackup2/singularity/mhoelzer/multifractal-ppr-meta-0.1.img
file. It seems nextflow somehow forces writing of such a singularity image during runtime. THen this is used and does not work (throws this lib error).
so nextflow is building the wrong singularity container or what is the issue?
I try to figure it out. But yes, it seems that nextflow builds a singularity image from the docker files that is somehow defect.
When I just execute the following code on the LSF cluster:
singularity exec docker://multifractal/ppr-meta:0.1 /bin/bash -c "/bin/bash -ue /hps/nobackup2/production/metagenomics/mhoelzer/nextflow-work-mhoelzer/90/d1a1ab1e56cb7fb1c28ba4862513fd/.command.sh"
Docker image path: index.docker.io/multifractal/ppr-meta:0.1
Cache folder set to /hps/nobackup2/singularity/mhoelzer/docker
[7/7] |===================================| 100.0%
Creating container runtime...
Exploding layer: sha256:35c102085707f703de2d9eaad8752d6fe1b8f02b5d2149f1d8357c9cc7fb7d0a.tar.gz
Exploding layer: sha256:251f5509d51d9e4119d4ffb70d4820f8e2d7dc72ad15df3ebd7cd755539e40fd.tar.gz
Exploding layer: sha256:8e829fe70a46e3ac4334823560e98b257234c23629f19f05460e21a453091e6d.tar.gz
Exploding layer: sha256:6001e1789921cf851f6fb2e5fe05be70f482fe9c2286f66892fe5a3bc404569c.tar.gz
Exploding layer: sha256:d99ed20073595350306b99135321f842d1622b67ee8cdd2cd4eaa413b53eb050.tar.gz
Exploding layer: sha256:524ba74a8b5367065753266de0a665030c36808c345473feaab1ead2e6bb46b2.tar.gz
Exploding layer: sha256:2cc57967173f14678c6ada7b5436464029b75d7348b242dbaeeb71c29505051f.tar.gz
Exploding layer: sha256:3d898dcec34fa2ce3d7c1004335ef14a6610c9e5c13de4f204ae313a6c1ced91.tar.gz
7 files are downloaded and the container runtime is built correctly and everything runs smooth.
No file
multifractal-ppr-meta-0.1.img
is created and used. While executing with nextflow, such a file is generated and used and then throws the lib error as it seems.
okay the question is if you can tell nextflow to change the execution type. Sadly not much written here
did you try this option?
singularity.autoMounts = true
also check this:
could be this issue, or difference:
Unlike Docker, Nextflow does not mount automatically host paths in the container when using Singularity. It expects they are configure and mounted system wide by the Singularity runtime. If your Singularity installation allows user defined bind points read the Singularity configuration section to learn how to enable Nextflow auto mounts.
could be the reason why its not able to open the Lib file
Thx, yeah I was just reading the same web page hoping to find out how Nextflow actually does the conversion of an image from DockerHub to this *.img Singularity file that then will be used. But yeah, not so much information there.
This is what happens when I delete the *.img file, Nextflow automatically pulls it from the DockerHub location:
[06/75ad8d] process > ppr_dependecies:ppr_download_dependencies [100%] 1 of 1 ✔
[90/c1cc9d] process > pprmeta_wf:input_suffix_check (1) [100%] 1 of 1 ✔
[- ] process > pprmeta_wf:pprmeta -
[- ] process > pprmeta_wf:filter_PPRmeta -
Pulling Singularity image docker://multifractal/ppr-meta:0.1 [cache /hps/nobackup2/singularity/mhoelzer/multifractal-ppr-meta-0.1.img]
and then throws the lib error. I just tried singularity.autoMounts = true
-- did not change anything.
Was ist das auch fuer eine komische library? Die findet man immer nur im Zusammenhang mit MATLAB
What I also just noticed: both dockers that make trouble are larger than 5GB... would be strange but maybe that is also an issue. @replikation I remember the problems with the MITOS docker and this one was also huge and I think only ~3GB when you build it again
the large docker issue is currently limited to google genomics because it only has a 10 GB boot disk, and docker container are pulled there. Nextflow is working on "bootdisk" adjustments :) Mitos was 3GB "packed" so it was filling up the boot disk.
I was able to replicate that issue with another docker of metawrap. also quite huge
so in the end it might be better to build singularity container first.?
SOLVED
DIR=/hps/nobackup2/singularity/mhoelzer/build
mkdir $DIR/.singularity
mkdir $DIR/.singularity/tmp
mkdir $DIR/.singularity/pull
mkdir $DIR/.singularity/scratch
export SINGULARITY_CACHEDIR=$DIR/.singularity
export SINGULARITY_TMPDIR=$DIR/.singularity/tmp
export SINGULARITY_LOCALCACHEDIR=$DIR/singularity/tmp
export SINGULARITY_PULLFOLDER=$DIR/.singularity/pull
export SINGULARITY_BINDPATH=$DIR/.singularity/scratch
singularity build /hps/nobackup2/singularity/mhoelzer/multifractal-ppr-meta-0.1.img docker://multifractal/ppr-meta:0.1
Now using this img works:
singularity exec /hps/nobackup2/singularity/mhoelzer/multifractal-ppr-meta-0.1.img /bin/bash -c "/bin/bash -ue /hps/nobackup2/production/metagenomics/mhoelzer/nextflow-work-mhoelzer/90/d1a1ab1e56cb7fb1c28ba4862513fd/.command.sh"
even with nextflow:
nextflow run phage.nf --fasta test-data/T7_draft.fa -profile lsf
[28/74d211] process > ppr_dependecies:ppr_download_dependencies [100%] 1 of 1 ✔
[80/adf1af] process > pprmeta_wf:input_suffix_check (1) [100%] 1 of 1 ✔
[07/715b05] process > pprmeta_wf:pprmeta (1) [100%] 1 of 1 ✔
[54/f2290c] process > pprmeta_wf:filter_PPRmeta (1) [100%] 1 of 1 ✔
Completed at: 05-Nov-2019 13:10:45
Duration : 1m 6s
CPU hours : 0.1
Succeeded : 4
Fazit
Singularity has problems to build an intact image from the DockerHub pull on the LSF cluster environment here. This post was the game changer to get it work: https://github.com/poldracklab/fmriprep/issues/1392
Per default, Singularity uses /tmp during the conversion of the Docker image to a Singularity image. The /tmp folder here on the EBI LSF infrastructure seems to be somehow restricted in size and thus large images are not completely built - without any error message.
so TLDR: you set up your build space somewhere else and it was working?
you should add this as a comment to the singularity config :) nice job
so now you can take a look at the heatmap.. lol
It was a long and annoying journey ^^
But exactly, it seems to be the best way to setup some singularity env variables to a directory that has enough space and is writable/readable for all nodes of the cluster. I will try now if this also solves the MARVEL #17 issue and if so I will add the LSF config to the master branch.
perfect. but now nz has full singularity expertise :)
Following up to this issue: To get working Singularity images on an LSF cluster (or at least the EBI one) it seems to be sufficient to connect to a node with enough RAM (not only 2GB) and set
export SINGULARITY_TMPDIR=$DIR/.singularity/tmp
to some folder that is read/writeable.
Then execute
singularity build /path/to/singularity/image/multifractal-deepvirfinder-0.1.img docker://multifractal/deepvirfinder:0.1
produces a correct singularity image file.
I have now a small ruby script that gets a nextflow config file as input, extracts all containers, checks if singularity images are available, and if not, builds them like described above. By doing so, I dont have to rely on Nextflow taking care of the Docker>Singularity transversion. Hopefully, this will solve now all my further issues with Nextflow/Docker executions on the EBI/LSF cluster :D
@hoelzer
this sounds a bit like you should ask someone to increase that space :)
Additionally, Nextflow allows you to setup a "tmp" dir to another place. Did you try this? please try this, as you can add this the singularity config file - or similar how we handle the bucket dir for cloud profiles
Name | Description |
---|---|
NXF_TEMP | Directory where temporary files are stored |
NXF_SINGULARITY_CACHEDIR | Directory where remote Singularity images are stored. When using a computing cluster it must be a shared folder accessible from all computing nodes. |
@replikation how do I add this to the configuration file? simply
NXF_SINGULARITY_CACHEDIR='/this/is/a/path'
?
ah, or maybe I already did this by
singularity {
...
cacheDir = params.cachedir
}
@replikation ok with the lsf.config file I added now that holds:
process.executor = 'lsf'
singularity {
enabled = true
autoMounts = true
cacheDir = params.cachedir
}
I can execute the nextflow and let it pull and translate the images from DockerHub to Singularity. This is working now.
It seems that the only problem was that no
cacheDir
was properly set (I dunno how the sysadmins here configured this).
Another problem seems to be: when nextflow tries to pull multiple images and translates them into singularity images and somehow breaks, not only the image is fucked up that caused the break but possibly also some others that are incomplete then. Annoying to figure out which ones are corrupted then.
Anyway, adding CacheDir=
to the config file seems to help, although I now had to build the image for deepvirfinder again manually after deleting all images again for testing. Still, weird
its now understandable why google genomics is running on docker and not singularity :D
Execution from the nextflow pipeline throws:
The git/database is cloned correctly.