Open nschan opened 1 year ago
I realized that this issue is tied to $CH_IMAGE_STORAGE. ch-run refuses to allow images from its storage directory by path, and accepts these only by name. One possible workaround is to unset CH_IMAGE_STORAGE and use NXF_CHARLIECLOUD_CACHEDIR pointing to the same directory before attempting to run a pipeline.
Charliecloud project lead here. Thanks for this bug report!
This behavior is by design. In general, the storage directory has never been a stable API, so folks should not depend on its structure. However, we've only been erroring on ch-run
against images in the storage directory since Charliecloud 0.31, I think.
You can now run images in storage by specifying their name, e.g. instead of ch-run $CH_IMAGE_STORAGE/img/foo ...
you would say ch-run foo ...
. (Note that image names are encoded, so you can't reliably say ch-run $(basename $CH_IMAGE_STORAGE/img/foo)
.)
However, ch-run -w foo
is disallowed because this is quite likely to corrupt the storage directory. In this case you would have to export to another directory with ch-convert
, then ch-run -w
that other directory. (However, best practice is to avoid -w
and bind-mount output directories and whatnot.)
An alternative to ch-convert
would be to build a small derived image containing the desired mount directory.
See also the embarrassingly old hpc/charliecloud#96.
Since charliecloud will soon provide --write-fake
, are there plans to add support for this to nextflow?
I guess it would require some checks for the charliecloud version used, or removing support for versions that do not support --write-fake
. As I understand versions <0.33 are fine with -w
from the cache directory, but >0.33 not. Presumably --write-fake
would be supported at some point in the future (>0.35 ?)
Unfortunately, no version of Charliecloud is fine with -w
in the storage directory, though consequences vary. We didn't enforce this until recently, however.
--write-fake
will be available in 0.36, to be released soon.
I guess I should have said 'works, although it is not a great idea' :). I think using --write-fake
for upcoming versions is the best solution, the main issue I see is how older versions of charliecloud could be supported, or if that is even necessary. @phue what are your thoughts on this?
The sole purpose of that -w
flag was to enable bind-mounting of the nextflow work directory when using public container images. Now that charliecloud
soon supports overlayfs
, it should be dropped.
And yes, compatibility with earlier versions is problematic, but that has happened before because of another breaking change
The shared (multi-user) cache directory you are using is another can of worms though, I currently don't see how that can work properly with the rather recent git-based layer caching used in charliecloud
.
I think the way forward is to just drop the CharliecloudCache
class and leave it to the user to pre-pull images if desired.
@nschan
Could you try these config settings with the current charliecloud
git, to see if that already works with the latest nextflow
version?
charliecloud {
cacheDir = false
readOnlyInputs = true
runOptions = "--write-fake"
}
I generally agree that breaking backwards compatibility could be justified in this case.
I see the issues of the multi-user cache.. In practice that cache is pretty much only used by me, but we will move away from a shared cache.
Anyhow, the approach you suggested does not currently work (nextflow@23.10.1, charliecloud@0.36~pre+78aae5e), but I assume that is something for @reidpr ?
Command error:
ch-run[47171]: error: can't read: /ch/environment: Stale file handle (ch_misc.c:259 116)
When i try to run the container manually with -W
or --write-fake
I get this:
ch-run[35682]: error: can't overlay: Operation not permitted (ch_core.c:323 1)
This is using quay.io/biocontainers/agat:1.1.0--pl5321hdfd78af_1
Edit:
Since you mention dropping CharliecloudCache
, it is anyway somewhat necessary to pre-pull images, since in many cases pipelines will start a bunch of different processes in the first step, which often leads to #3566
Another edit:
hpc/charliecloud/pull/1793 mentions kernel >5.11 ideally >6.6. The hpc system I am using is running on 4.12.14, so I guess that explains why --write-fake
does not work for me...
To sum up:
I think the fact that relying on --write-fake
would break compatibility with older kernels is very problematic. Not supporting older charliecloud versions is one thing, but dropping support for older kernels seems a bit drastic.
I think an alternative could be something along the lines of building write-able process specific containers. This would hopefully solve most of the problems associated with running containers from a cache. I admit my understanding of containers is pretty basic, but as I read the current implementation all parallel processes that use the same container image run in the same container (irrespective $CH_IMAGE_STORAGE
set). Wouldn't it be better to have one dedicated container per process? Using a cache and ch-convert
(as sugested by @reidpr earlier) seems like a reasonable approach to me.
This would keep things backwards compatible, avoid the current issues with -w
, and might actually be the correct way to run containers..?
It would require some adjustments to the CharliecloudBuilder, essentially adding a step of ch-convert
to create the container. I am not sure if the new container should be stored in the current work directory, or if that will create a horrible mess when the work directory including the container is mounted into the container?
Of course guess adding -w --unsafe
instead of -w
would be another (strongly discouraged) option.
Anyhow, the approach you suggested does not currently work (nextflow@23.10.1, charliecloud@0.36~pre+78aae5e), but I assume that is something for @reidpr ?
Command error:
ch-run[47171]: error: can't read: /ch/environment: Stale file handle (ch_misc.c:259 116)
“Stale file handle” is something about inode re-use. I know it often comes up in NFS. If it’s a Charliecloud bug we definitely want to fix it; is there a reproducer without Nextflow?
I read the current implementation all parallel processes that use the same container image run in the same container (irrespective
$CH_IMAGE_STORAGE
set). Wouldn't it be better to have one dedicated container per process?
Processes on the same node that are in different containers cannot use certain system calls to communicate, and this has performance impact. That’s the motivation for --join
, though it’s not the default.
Note that multiple containers can be running the same image. I’d be skeptical of a separate image per process. That’d be a lot of images on a wide computer.
Another thing to consider is hpc/charliecloud#1408. This would be a supported way to create a container modified by a shell command (e.g., to add mount points).
One could also ch-convert
, then manually adjust the resulting directory.
I hope that helps.
I could not reproduce the "stale file handle" outside nextflow, it might also have other, filesystem related reasons.
Processes on the same node that are in different containers cannot use certain system calls to communicate, and this has performance impact. That’s the motivation for --join, though it’s not the default.
I was referring to nextflow processes (tasks), which I think should anyway not communicate with each other while running.
What is not really clear to me, is what would happen if I have several (simultaneous) jobs running containers of the same image and for some reason input-specific files are created in a certain folder of the image (e.g. /opt/someprogram/somefile.tmp
) with -w
. As I understand all jobs see the what was written to /opt/program
by other jobs. If the contents of somefile.tmp
are specific for the task input then this could turn out to be somewhat of a problem, no?
If that could happen I think this would argue in favour of using ch-convert
for each nextflow task. If that is not an issue, then I think it should be ok for nextflow to just store images in $workDir/charliecloud
as it currently does when $CH_IMAGE_STORAGE
and $NXF_CHARLIECLOUD_CACHEDIR
are not set.
Looks like I misunderstood. Thanks for clarifying.
Bug report
Running nextflow@23.10.0 together with charliecloud@0.34 yields an error:
error: can't run directory images from storage (hint: run by name)
Expected behavior and actual behavior
The expected behaviour is that nextflow is able to start the container. This is related to a change in charlieclouds behaviour (https://github.com/hpc/charliecloud/pull/1505)
Steps to reproduce the problem
This problem should occur in any case where charlielcloud > 0.33 is used as the container engine for nextflow. I do not think that this problem is specific to the nextflow version used.
Program output
Environment
Additional context