Closed hmedwards closed 2 years ago
Hi @hmedwards I'm not sure here unfortunately.
Looking at the imperial
profile, it seems you are using singularity.
The error here:
FATAL: container creation failed: mount /.../nf-core/work/69/4f93b33b26be47ec75c0b608efe4d4->/.../nf-core/work/69/4f93b33b26be47ec75c0b608efe4d4 error: while mounting /.../nf-core/work/69/4f93b33b26be47ec75c0b608efe4d4: destination /.../nf-core/work/69/4f93b33b26be47ec75c0b608efe4d4 doesn't exist in container
Suggests to me singularity is having a problem mountining your filesystem to the container, so the software itself can't access the directories... but this isn't related to eager nor nextflow (As far as I can help).
I see that @combiz wrote the Imperial nf-core profile, maybe he has any suggestions?
In addition
Caused by: java.io.IOException: Stream closed
Maybe the node the job was running on had an interupption to the shared filesystem or something?
I say that as I see some of the steps of the pipeline finished correctly, and get_software_versions is an extremely small step (so it's not running out of memory or anything)
Hi @jfy133 Thanks so much for your quick reply. I wondered whether it may be a system issue as opposed to the pipeline itself. Hopefully @combiz will have some insight but I'll also try contacting someone at Imperial to see if they can help at all.
Interestingly, we had this problem as well. The temporary fix was to pull the image manually once, so that it was saved in the local singularity cache. Then it used the cached version and the error no longer occurred. The downside is that you have to do it each time the version is updated.
This does look familiar though I haven't encountered it in a while. We're currently using Singularity on the HPC with the Imperial config and it's working ok. Roman's suggestion to pull the image into a cache before starting sounds good. I'd also be sure to use the latest version of NF (rather than the module load version) for which you'll also need the latest java sdk (module load java/jdk-16).
On Thu, 16 Jun 2022 at 15:35, Roman Briskine @.***> wrote:
Interestingly, we had this problem as well. The temporary fix was to pull the image manually once, so that it was saved in the local singularity cache. Then it used the cached version and the error no longer occurred. The downside is that you have to do it each time the version is updated.
— Reply to this email directly, view it on GitHub https://github.com/nf-core/eager/issues/894#issuecomment-1157733462, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHM23ZCXLNCPEXDD3LIGJUTVPM3THANCNFSM5YTX7LLA . You are receiving this because you were mentioned.Message ID: @.***>
Thanks @brisk022 and @combiz
I am using nextflow version 22.04.3 so I don't think it is a version issue?
This is new to me I'm afraid, could you explain how I pull an image into a cache please? Thanks!
Unless you explicitly disable caching, singularity should do it automatically. You can pull the image and then delete it.
singularity pull --name eager.tmp.sif docker://nfcore/eager:2.4.4
rm eager.tmp.sif
After that, when you list the cached items with singularity cache list -v
, there should be some items with recent timestamps.
Another way to check is to pull the image again. This time singularity should inform you that it is using the cached image.
$ singularity pull --name eager.tmp.sif docker://nfcore/eager:2.4.4
INFO: Using cached SIF image
Unfortunately, I cannot say anything about versions. FWIW, we are using nextflow v22.04.0 from the bioconda channel.
Thanks. I tried re-running with the same conda version as you (22.04) but I had the same error occur.
Then I tried pulling and deleting the image as suggested but it seems it was already using a cached image? In any case after running this I got the same container creation error.
(nextflow22.04) [he11@login-a nf-core]$ singularity pull --name eager.tmp.sif docker://nfcore/eager:2.4.4
INFO: Using cached SIF image
(nextflow22.04) [he11@login-a nf-core]$ rm eager.tmp.sif
(nextflow22.04) [he11@login-a nf-core]$ singularity cache list -v
NAME DATE CREATED SIZE TYPE
006c60b566117b45543ffe 2022-06-07 12:00:11 1.32 GiB blob
11a4244dfa1c973d17ae6e 2022-06-07 11:59:36 0.74 KiB blob
2ecb54bbaab44a001995fc 2022-06-07 11:59:34 0.09 KiB blob
5225e31eacb3728e531e0f 2022-06-07 12:00:13 4.98 KiB blob
5849476b4bf8a724cc5539 2022-06-07 11:59:33 547.29 KiB blob
679c171d6942954a759f2d 2022-06-07 11:59:33 50.53 MiB blob
7882648efbd1a387aeafae 2022-06-07 11:59:35 0.09 KiB blob
852e50cd189dfeb54d9768 2022-06-07 11:59:28 25.85 MiB blob
a6236801494d5ca9acfae6 2022-06-07 11:59:31 76.65 MiB blob
cdc1d48f72cc1668a39fcd 2022-06-07 12:00:13 1.56 KiB blob
d425ff08d54d93f02c9f95 2022-06-07 12:00:12 4.15 KiB blob
198b710c2118afced6cbb2 2022-06-07 12:42:55 1.43 GiB oci-tmp
There are 1 container file(s) using 1.43 GiB and 11 oci blob file(s) using 1.47 GiB of space
Total space used: 2.90 GiB
I've asked ICT as it may be due to changes in binding paths for containers in Singularity 3.8 vs 3.7.
e.g. https://github.com/apptainer/singularity/issues/6181#issuecomment-937315849
On Sat, 18 Jun 2022, 10:52 hmedwards, @.***> wrote:
Thanks. I tried re-running with the same conda version but same error occurred.
Then I tried pulling and deleting the image as suggested but it seems it was already using a cached image? In any case after running this I got the same container creation error.
(nextflow22.04) @. nf-core]$ singularity pull --name eager.tmp.sif docker://nfcore/eager:2.4.4 INFO: Using cached SIF image (nextflow22.04) @. nf-core]$ rm eager.tmp.sif (nextflow22.04) @.*** nf-core]$ singularity cache list -v NAME DATE CREATED SIZE TYPE 006c60b566117b45543ffe 2022-06-07 12:00:11 1.32 GiB blob 11a4244dfa1c973d17ae6e 2022-06-07 11:59:36 0.74 KiB blob 2ecb54bbaab44a001995fc 2022-06-07 11:59:34 0.09 KiB blob 5225e31eacb3728e531e0f 2022-06-07 12:00:13 4.98 KiB blob 5849476b4bf8a724cc5539 2022-06-07 11:59:33 547.29 KiB blob 679c171d6942954a759f2d 2022-06-07 11:59:33 50.53 MiB blob 7882648efbd1a387aeafae 2022-06-07 11:59:35 0.09 KiB blob 852e50cd189dfeb54d9768 2022-06-07 11:59:28 25.85 MiB blob a6236801494d5ca9acfae6 2022-06-07 11:59:31 76.65 MiB blob cdc1d48f72cc1668a39fcd 2022-06-07 12:00:13 1.56 KiB blob d425ff08d54d93f02c9f95 2022-06-07 12:00:12 4.15 KiB blob 198b710c2118afced6cbb2 2022-06-07 12:42:55 1.43 GiB oci-tmp
There are 1 container file(s) using 1.43 GiB and 11 oci blob file(s) using 1.47 GiB of space Total space used: 2.90 GiB
— Reply to this email directly, view it on GitHub https://github.com/nf-core/eager/issues/894#issuecomment-1159413498, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHM23ZCNXWZESPBCISZQ4WLVPWL6DANCNFSM5YTX7LLA . You are receiving this because you were mentioned.Message ID: @.***>
Hi @hmedwards @brisk022 @combiz I'm going to close this now as it's not an eager spceific error by the sounds of it, but feel free to keep communicating here if you wish!
I'm running into this error too.
Although the crash is coming from Singularity, I think the error is in Nextflow or in this pipeline. The documentation says "Beware that the mount points must exist in the built image". As such, the referenced apptainer ticket (apptainer/singularity#6181) is closed as intended behavior. Ideally, these mounts should exist in the container, nfcore/eager:2.4.6
.
I'm willing to debug this myself, but I am confused on when Nextflow decides to create a bind mount. Although the eager
pipeline crashes this way, I can't replicate it with simpler examples. Does anyone have ideas about that?
So more info on singularity in Nextflow:
https://www.nextflow.io/docs/latest/container.html#singularity
However I suspect the trick is herE: https://www.nextflow.io/docs/latest/config.html#scope-singularity
Where you need to specify the autoMounts
settings in the singulartiy
scope of a nextflow configuration pipeline, but note it has a caveat:
When true Nextflow automatically mounts host paths in the executed container. It requires the user bind control feature enabled in your Singularity installation (default: false).
Is this set accordingly for you?
To investigate further, you can look inside the .command.run
file present in each working directory (this is actual bash/batch script executed by nextflow). The relevant sections for debugging is likely nxf_launch()
I started with the underlying .command.run
and found this minimal non-working example:
$ mkdir /tmp/test
$ singularity exec -B /tmp/test https://depot.galaxyproject.org/singularity/python:3.8.3 pwd
INFO: Converting SIF file to temporary sandbox...
WARNING: Skipping mount /home/azureuser/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.3.0/apptainer-1.1.5-tkaiqwrpiog2vzr5okpp77nqpvdtwmv6/var/apptainer/mnt/session/etc/resolv.conf [files]: /etc/resolv.conf doesn't exist in container
INFO: Cleaning up image...
FATAL: container creation failed: mount hook function failure: mount /tmp/test->/tmp/test error: while mounting /tmp/test: destination /tmp/test doesn't exist in container
The bug might lie in my installation or configuration of Singularity, but the current Singularity documentation would suggest otherwise. It says that the mount points must already exist in the container, so Nextflow is incorrect by asking to mount something that does not already exist. Perhaps that documentation is out-of-date, or I am understanding it incorrectly.
I do have user bind control = yes
in path/to/singularity/etc/singularity/singularity.conf
, which is owned by root, and has allow setuid = yes
.
I installed singularityce
(and e2fsprogs
and squashfuse
) with Spack. I tried with/without suid
, with both of apptainer
and singularityce
. The problem still persists.
It seems my problem was that I was trying to share a directory in /tmp
and my /path/to/etc/singularity/singularity.conf
has mount tmp = yes
, which apparently conflicts.
singularity exec --bind /a/b/c ...
works, and if I set mount tmp = no
, then --bind /tmp/test
also works.
Given this, I don't understand why the error message in Singularity and Apptainer was error while mounting /tmp/test: destination /tmp/test doesn't exist in container
. Also I don't know what that bit of Singularity documentation about the mount point must exist means. I will file issues in those respective repositories. At least my Nextflow work can continue.
Thank you @jfy133 and @marissaDubbelaar for pointing me in the right direction.
Thanks for investigating @charmoniumQ ! Glad you were able to solve it somewhat :)
Hi there!
I've checked the recommended troubleshootings packages but am unable to find the answer to my problem.
I have tried running eager on a couple fastq files but hit an error saying container creation failed.
My nf-core/eager command, is:
Which results in error messages:
and:
I'd appreciate any help on this. Many thanks.