Closed josh-willis closed 3 years ago
We're interested in this issue as well. We were encouraged (by Dave Dykstra at Fermilab) to move our singularity sandbox images from /cvmfs/eic.opensciencegrid.org/singularity (which we manage and which goes beyond just singularity images) to /cvmfs/singularity.opensciencegrid.org, but this issue turns out to be a snag since our documented workflows use the singularity run
approach. We could also add /.singularity.d
by hand, but we also have users without cvmfs who build their own singularity images.
We have decided to move forward with this suggestion. We are planning the transition process now in order to be the least disruptive to current users. We expect to transition to a singularity build process to complete in a week or two. I will notify this ticket when the transition is completed.
@djw8605 Thanks for doing this. Will there be an intermediate period where we can test the effect, or will the new mechanism somehow be "opt-in"? We'd like to be able to test that we can propagate all of the needed features from Docker into the Singularity images converted by the new method.
In particular, right now, if I read these docs I see: Two options that can be used in the Dockerfile to set the environment or default command are ENTRYPOINT and ENV. Unfortunately, both of these aspects of the Docker container are deleted when it is converted to a Singularity image in the Open Science Grid. Will that still be true when you have implemented your update?
Thanks again
Hi Derek, Thank you very much for the effort! 14 days has passed since your comment. Any updates on this?
Hi everyone on this thread,
I have the conversion working. In order to test everything to make sure the transition works, could you please send me on this thread:
This conversion will convert the ENV and ENTRYPOINT lines in the docker file to singularity equivalents. I am especially interested in testing images that already have the .singularity directory in order to test for backward compatibility.
Hi Derek,
As a test for backwards compatibility, you could try the latest PyCBC release version. If you try your conversion on the PyCBC docker image at pycbc/pycbc-el7\:v1.16.3
, then on the resulting singularity image, the following command:
singularity exec -e -p -i -c <path to test converted singularity image> /bin/pycbc_inspiral --version | head
should output:
--- PyCBC Version --------------------------
Version: 1.16.3
Branch:
Tag:
Id:
Builder:
Build date:
Repository status is
Imported from: /usr/lib64/python2.7/site-packages/pycbc/__init__.pyc
I'm not sure if your tests strictly require the docker image to be on dockerhub; currently we can also specify other locations in the docker_images.txt
file in this repo. If that is still the case, then I have created a version of PyCBC with an updated docker build that does not include the /.singularity.d
directory in the docker image. That docker container is at docker://containers.ligo.org/joshua.willis/pycbc:latest
. Once converted, the resulting singularity image should execute the command:
singularity exec -e -p -i -c <path to singularity image> /bin/test_singularity_install.sh
and produce the output:
--- PyCBC Version --------------------------
Version: 1.16.dev3
Branch:
Tag:
Id:
Builder:
Build date:
Repository status is
Imported from: /usr/lib64/python2.7/site-packages/pycbc/__init__.pyc
PATH = /usr/local/bin:/usr/bin:/bin:/opt/mvapich2-2.1/bin
LAL_DATA_PATH = /cvmfs/oasis.opensciencegrid.org/ligo/sw/pycbc/lalsuite-extra/current/share/lalsimulation
If you are able, it would also be interesting to me if you then set the environment variable LAL_DATA_PATH
to (say) /foo/bar
and run the command again, without the -e
flag, so that the environment is not cleaned. Then you should hopefully see:
--- PyCBC Version --------------------------
Version: 1.16.dev3
Branch:
Tag:
Id:
Builder:
Build date:
Repository status is
Imported from: /usr/lib64/python2.7/site-packages/pycbc/__init__.pyc
PATH = /usr/local/bin:/usr/bin:/bin:/opt/mvapich2-2.1/bin
LAL_DATA_PATH = /foo/bar
This would check that the variable LAL_DATA_PATH is inherited from the outside environment if not started with -e
, which I would like to check works.
If you can't test my private docker image from containers.ligo.org
, let me know, and I'll see if I can get it on dockerhub.
@djw8605 I would like you to try opensciencegrid/tensorflow-gpu
. I don't have a good test command but if you can give me a tarball of the resulting /.singularity.d/
, I should be able to validate it that way.
Hi @josh-willis.
I copied your containers to my repo (docker pull, docker tag to my repo, docker push). The build failed for pycbc-el7:v1.16.3:
ERROR: build: failed to make environment symlinks: symlink .singularity.d/runscript /cvmfs/singularity.opensciencegrid.org/.images/2e/rootfs-903c5a64-b7c4-11ea-9085-525400b61c5e/singularity: file exists
FATAL: While performing build: packer failed to pack: while inserting base environment: build: failed to make environment symlinks: symlink .singularity.d/runscript /cvmfs/singularity.opensciencegrid.org/.images/2e/rootfs-903c5a64-b7c4-11ea-9085-525400b61c5e/singularity: file exists
What is the docker file performing? Some sort of symlinking?
Custom repo:
$ singularity exec -e -p -i -c /cvmfs/singularity.opensciencegrid.org/djw8605/pycbc-el7\:containers-ligo /bin/test_singularity_install.sh
--- PyCBC Version --------------------------
Version: 1.16.dev3
Branch:
Tag:
Id:
Builder:
Build date:
Repository status is
Imported from: /usr/lib64/python2.7/site-packages/pycbc/__init__.pyc
PATH = /usr/local/bin:/usr/bin:/bin:/opt/mvapich2-2.1/bin
LAL_DATA_PATH = /cvmfs/oasis.opensciencegrid.org/ligo/sw/pycbc/lalsuite-extra/current/share/lalsimulation
Hi @djw8605
Yes, at present the pycbc Dockerfile creates symlinks, to mimic what Singularity itself does when creating a Singularity image from a docker container. The error you're seeing is exactly what we're trying to fix, for other users who are trying to build Singularity images directly from our dockerhub. But we create and populate a /.singularity.d
directory in the Docker container, so that when the current cvmfs-singularity-sync
script runs (which does not use Singularity directly) those files are present in the Singularity image.
You mentioned that you are wanting to test backwards compatibility of your updated conversion and publishing. Does the update try to detect when the docker image has created a /.singularity.d
directory, and if so, fall back to the conversion that is presently there? Or does it try to do something else?
Note, if you do not want to do such a detection, but just move to using singularity directly, then we can change our Docker container to rely on the updated script. But I'd like to be able to do some more testing, and in particular to do some more tweaks to my custom repo, and ask some of the PyCBC developers no longer in the collaboration to test that, to ensure I am not breaking things for them. We may be the only people creating the /.singularity.d
directory in our Docker container, and it doesn't make sense for you to complicate your script just for us, especially as we will move away from doing that once the alternative is in place.
But, we have upcoming production runs that will rely on this, so we can't really afford to have our production CVMFS images in a broken state. So let me know how you want to proceed: I can either update my custom repo and we can test, or, if your update should be detecting the situation that the current PyCBC production docker container uses (creating /.singularity.d
in Docker) and working around it, then we can also iterate on that.
Hi Josh,
I updated the container at /cvmfs/singularity.opensciencegrid.org/joshua.willis/pycbc\:latest/
to be built with Singularity rather than Docker. If you update the container in dockerhub, it will be re-built the old way, with Docker.
Please test the container. Note, this transition will not break existing containers. Only on updates will it break.
@josh-willis I want to point out two issues. Is https://github.com/gwastro/pycbc/blob/master/Dockerfile the image you are trying here?
The symlinking from /.. to /.singularity.d/ is no longer needed. That was a transition from Singularity 2.2 to to 2.3, and we no longer see those old versions on OSG. You should be able to delete these lines: https://github.com/gwastro/pycbc/blob/master/Dockerfile#L19-L22
In the container I tested, I had a /.singularity.d/env/90-environment.sh
file copied just like you do (https://github.com/gwastro/pycbc/blob/master/Dockerfile#L3). That particular file name is used by a file generated by singularity build
so your version will be overwritten. For example, compare what you have in your Github repo to /cvmfs/singularity.opensciencegrid.org/joshua.willis/pycbc\:latest/.singularity.d/env/90-environment.sh
@djw8605 So, I infer that your update of /cvmfs/singularity.opensciencegrid.org/joshua.willis/pycbc\:latest/
to be built with Singularity rather than Docker wasn't a permanent change, as I updated my Docker image at containers.ligo.org
and the newly generated image in CVMFS no longer has /.singularity.d/
nor /singularity
inside. Would it be possible for you to again manually regenerate that singularity image from my docker container at containers.ligo.org/joshua.willis/pycbc:latest
using the new script you're testing? I have updated that docker container to reflect some feedback from other PyCBC developers, and if that Singularity image is rebuilt (using singularity) from my docker container, then I think it should have all of the features that I am trying to test, both for our internal use cases as well as external.
@rynge The singularity container that Derek was generating for me was not built from my github fork of PyCBC, but rather from here: https://git.ligo.org/joshua.willis/pycbc/-/tree/devel. That source is to test some changes that will be proposed as a MR on the official pycbc repo, once I and others have a chance to test some shorter workflows with it.
Hi @josh-willis. Rather than me being the bottleneck for this, here is the command that the script is running:
singularity --silent build --disable-cache=true --force --fix-perms --sandbox /tmp/container docker://containers.ligo.org/joshua.willis/pycbc:latest
You should be able to test and iterate with that command. Let me know if you have any issues with that command.
Hi @djw8605 Thanks, but my issue is getting it published to CVMFS, so I and others who don't have access to our LIGO systems can test it. I don't think there's a way for me to do that step myself?
Ah, ok. I understand.
@djw8605 But I did run the singularity conversion on my newest Docker image just now, to verify that the conversion succeeded and that my simple test script performed as it should. So whenever you get the chance to convert and publish the updated docker image to CMVFS singularity, I think I am at least not wasting your time by asking you to convert a buggy container.
@josh-willis I updated your image in CVMFS. /cvmfs/singularity.opensciencegrid.org/joshua.willis/pycbc:latest
@djw8605 Thanks. I'm testing this (so far so good) and will ask some others outside LIGO/Virgo/Kagra who use PyCBC containers to do the same. If things are successful I'll open a PyCBC merge request to update our docker containers. We won't want to pull the switch on that until we know that you are ready to make the transition to the Singularity build approach, so can I ask if we are the only group you are still waiting to confirm with, before switching the production cvmfs-singularity-sync
over? Thanks.
@djw8605 As a further update, we are now happy with our tests. Let us know when you are ready to proceed, and we will merge our changes needed to update our docker images for the new conversion script (we do need to coordinate with you though, so that we don't get broken singularity builds, so if this is still waiting on other testing, please let me know in this thread). Thanks.
@djw8605 Just pinging to see if there's any update on this. Thanks.
@djw8605 Ping again, any update or ETA?
Hi, sorry for the delay. I just flipped the switch today. Let me know if you see any issues.
@djw8605 Thanks. The corresponding PyCBC MR was merged here. I have verified that our pycbc-el7:latest
Singularity image has been built correctly with this after I made that merge, and that a small test workflow of ~700 OSG jobs completed successfully using that updated Singularity image. So from our (PyCBC's) perspective I think this is working fine; thank you for your work on it.
If you want to apply this to /cvmfs/singularity.opensciencegrid.org/jeffersonlab/remoll:develop
, I can test it as well on an image that didn't try to create the .singularity.d directory by hand as a workaround. Or if you are confident about rolling this out system wide, I'd be happy as well.
We are fairly confident about rolling this out. Additionally, only newly updated containers will be built with singularity.
Sorry that I didn't close this. This was merged and put in production with #195
This issue came up initially as a bug filed against PyCBC in issue 3184. Because the existing script for creating singularity images from docker images uses a custom script that does not call singularity directly, we have (and possibly other users have as well) placed into our docker images some of the directories and files that singularity uses (the directory
/.singularity.d
and some subdirectories and files beneath it). While this works for our singularity images created and published to CVMFS via this repo, it breaks the ability of any other users to create a singularity image from our docker images, as it fails because the files and symlinks that process would create are already present.It looks like the open PR 196 and PR 195 are both relevant here. I don't know what that status of that work is, nor how they interact with each other (they look to me both to be PRs to accomplish the same thing, but I may be misunderstanding). Can someone comment on the timeline for the work in those PRs, or otherwise on the feasibility of modifying the
cvmfs-singularity-sync
script to invoke singularity directly, so that we could remove the specialized directories in our docker images and use the corresponding docker constructs instead?I'm also tagging @duncan-brown for feedback on this, as he may know other gotchas and because he created the structure that PyCBC is using initially.