hpc / charliecloud

Now hosted on GitLab.
https://gitlab.com/charliecloud/main
Apache License 2.0
312 stars 60 forks source link

ch-run with cachedir - confusing behavior #1770

Closed nschan closed 12 months ago

nschan commented 1 year ago

Hi,

the recent change in how ch-run behaves when running with a set CH_IMAGE_STORAGE is somewhat confusing. As of version 0.33 (I think) ch-run no longer runs images when given a path that points to $CH_IMAGE_STORAGE/img/name, and complains that images should be run by name. Images that are run by name cannot be combined with -w, meaning that it is not possible to bind something to root via -b. This breaks e.g. my hpc workflows where I bind a storage path to root and I would prefer to not bind it into /mnt/ to avoid having to deal with fixing all paths. I basically run pull with $CH_IMAGE_STORAGE set, then I unset this to run the container from the path to be able to use -w with -b and after I am finished with the container I set $CH_IMAGE_STORAGE again. This seems much less straight forward than the previous usage where I could just run from $CH_IMAGE_STORAGE with the path.

This also breaks the nextflow integration with charliecloud (see nextflow-io/nextflow/issues/4463).

Is there a way to restore the old usage, or allow running from storage by path, or enable running by name with -w? Maybe there is also a way to start containers with ch-run that I am missing that would work around this issue?

Thanks Niklas

reidpr commented 1 year ago

Hello Nilkas,

Thanks for reaching out!

Running images from the storage directory has never been supported, and the reason is that Charliecloud needs full control over everything in the storage directory. That is, ch-run -w against something in $CH_IMAGE_STORAGE is very likely to corrupt the storage directory. Only in PR #1505, however (0.31), did we actually enforce this.

That said, there are a couple of workarounds I can think of. The standard way to get a writeable image is something like:

$ ch-image build -t foo ...
$ ch-convert foo /var/tmp/foo
$ ch-run -w /var/tmp/foo ...

That is, export the image from storage to a directory of your choice, then run that directory.

However, with your use case, which sounds something like

$ ch-run -w -b /foo:/ ...

you may not need to build an image at all. If /foo is bound to container /, then it will over-mount the image in its entirety and you could just use /foo as an image directly, e.g.:

$ ch-run -w /foo ...

Does that help?

nschan commented 1 year ago

Hello Reid,

I guess I should add that since most the stuff I do is bio-related I typically do not build the image myself but do something like ch-image pull quay.io/biocontainers/somecontainer:tag and then try to run that container image. As I understand these are then not writeable by default? I guess I could do what you suggest and do something along

ch-convert quay.io/biocontainers/somecontainer:tag tmp/quay.io/biocontainers/somecontainer:tag
ch-run -w -b /foo tmp/quay.io/biocontainers/somecontainer:tag ...

I am not really sure if I understand your last point. I am indeed doing

ch-run -w -b /foo ...

which in my case binds the whole filesystem that is in /foo into the container, which is also where all of the files I want to use inside the container live. Of course, when the container was built by biocontainers this directory was not created. As far as I understand that is why I need to add -w?

Thanks Niklas

reidpr commented 1 year ago

As I understand these are then not writeable by default?

That's correct.

binds the whole filesystem that is in /foo into the container, which is also where all of the files I want to use inside the container live

Okay, looks like I misunderstood. Let me try again. You have a directory tree at /foo on the host, and you want to have it appear at the same path /foo inside the container — yes?

If that is the case, and the path /foo doesn't change, I'd suggest building a small derived image with the path you want (untested):

$ cat Dockerfile
FROM quay.io/biocontainers/somecontainer:tag
RUN mkdir -p /foo  # -p needed if more than one dir in path
$ ch-image build -t bar -f ./Dockerfile .

Then you can run it with ch-run bar ....

If /foo does change, then yes you'd need to ch-convert and then ch-run.

See also #96, which is a rather embarrassingly old.

nschan commented 1 year ago

Okay, looks like I misunderstood. Let me try again. You have a directory tree at /foo on the host, and you want to have it appear at the same path /foo inside the container — yes?

Exactly. I guess for interactive work rebuilding a derived container would be an option, but right now the advantage over making a copy via ch-convert to use with -w is not obvious to me, I guess it is I could use them in the cache?

However, one of my main use-cases for charliecloud is running nextflow pipelines, which usually submit jobs with predefined containers (i.e. pulling from quay.io/biocontainers). In this case the centralization of the containers is an advantage, often I am more interested in actually running an analysis in a broadly reproducible environment. In any case, I understand that the way nextflow currently implements running charliecloud is incompatible with charliecloud >= 0.31 if $CH_IMAGE_STORAGE is set (if this is unset nextflow will use a local cache). I assume that reasonable workarounds could be:

Did I get that correctly?

reidpr commented 1 year ago

Exactly. I guess for interactive work rebuilding a derived container would be an option, but right now the advantage over making a copy via ch-convert to use with -w is not obvious to me, I guess it is I could use them in the cache?

There’s a couple reasons why we consider read-only containers (i.e., no -w) a best practice. First is that if you know the image is unchanged, you have better provenance on it. Second is that SquashFS images perform better on many filesystems (especially Lustre, which is not good at metadata).

nextflow ignores the image cache (many pulls for containers that are shared across pipelines) or

I'm a bit nervous about sharing storage directory between users in general, because it’s not tested and we have known ways that it breaks. See #1701.

nextflow calls ch-convert to create a working copy of a cached container?

This should work.

I’ll also add that after not hearing much about it for a while, five people yesterday had requests that could be addressed by fixing #96. So that will be a higher priority.

reidpr commented 12 months ago

No traffic on this issue in a few weeks and based on the discussion I don’t think we’ll take any action; closing. Please LMK if that’s an error.

reidpr commented 11 months ago

That said, see PR #1793.