singularityhub / singularityhub.github.io

Container tools for scientific computing! Docs at https://singularityhub.github.io/singularityhub-docs
https://singularityhub.github.io
68 stars 9 forks source link

builder does not detect finished image in subfolder #215

Closed jafaruddinlie closed 4 years ago

jafaruddinlie commented 4 years ago

Links

Version of Singularity

Local: 3.5.3 shub: singularity-builder-3.4.1-100GB

Behavior when Building Locally

Builds fine.

Error on Singularity Hub

The build looks OK but exited with this error: INFO:  Adding labels WARNING: Label: APPLICATION_NAME already exists and force option is false, not overwriting WARNING: Label: APPLICATION_VERSION already exists and force option is false, not overwriting WARNING: Label: MAINTAINER_NAME already exists and force option is false, not overwriting WARNING: Label: MAINTAINER_EMAIL already exists and force option is false, not overwriting INFO:  Adding environment to container INFO:  Adding runscript INFO:  Creating SIF file... INFO:  Build complete: /root/build/container.sif ERROR Final image does not exist.

What do you think is going on?

Not really sure, version of Singularity not supported?

vsoch commented 4 years ago

Could you please attach the entire log here as a text file?

jafaruddinlie commented 4 years ago

Done and done! singularity-build-log-jafaruddinlie_shub-June 4, 2020, 12_02 a.m..txt

vsoch commented 4 years ago

I don’t see the end message about the finish time which concerns me - perhaps the builder was killed because of memory or filesystem limits. Could you please run again and:

And then again include the log here! It’s after midnight so I’m off to bed but I’ll take a look tomorrow. Err, today but later :)

jafaruddinlie commented 4 years ago

Hope you had a good night sleep! Here's the log as requested (debug mode turned on)

I didn't manage to note down the time it takes to build on shub, but on local machine it is around 30 minutes, size of the container 3.7GB, and memory used during the build around 1GB.

singularity-build-log-jafaruddinlie_shub-June 4, 2020, 2_35 a.m..txt )

vsoch commented 4 years ago

Thank you! The build time looks ok, similar to on your host:

Start Time: Thu Jun  4 07:07:44 UTC 2020.
  End Time: Thu Jun  4 07:35:30 UTC 2020

I'll need to debug this interactively, if not tomorrow then over the weekend (it's already after dinner time the next day since you originally posted, so time to relax!)

vsoch commented 4 years ago

hey @jafaruddinlie ! I've tested your build, and the container does complete successfully. There are two issues that I found (that we can debug separately) to figure out which is leading to the failure. The first is the container test. To test the container, we execute the "ls" command. However, your container doesn't seem to have an ls:

Singularity 64ae85d53aaa966cc99ad7793127893992a82bf8617936fb0923c3aaa6270919.sif:/> ls
bash: ls: command not found

So even if there is an issue before that, the test would fail at this step. Do you know why your image doesn't have ls? The next issue is a potential bug with the path (oy vey) for the builder, and I'm not sure why it hasn't happened before. Your resulting image gets placed in the same directory as the recipe file, but is looked for one above it. So - to test this would you mind trying to put the recipe one folder up (in the root of the repo). If that turns out to be the issue, and you figure out the test command with ls to get it running, then I'll need to create a new builder with the fix (and then ask you to test). In summary:

  1. figure out why your container can't run an ls
  2. try a recipe build from the root

Thanks!

jafaruddinlie commented 4 years ago
  1. I found out that one of the export PATH had a typo in the %environment section, I've fixed this and re-uploaded the Singularity file but it still has the same error.
  2. Same updated recipe, built from the root, works.

Both logs are attached.

singularity-build-log-jafaruddinlie_shub-vardict_notroot_dir.txt singularity-build-log-jafaruddinlie_shub-vardict_rootdir..txt

vsoch commented 4 years ago

Great! So we know the exact bug now. I update builders when there is a major release of Singularity and it coincides with a server restart, so unfortunately that won't be any time soon. I'll rename this issue to be with respect to the image path not detected when from a subfolder, and in the meantime you'll have to do builds with a recipe in the root. Thanks for reporting this issue!

vsoch commented 4 years ago

@jafaruddinlie if you could, might we be able to keep the recipe around for use when I develop the builder? I can ping you again when that time comes to keep you updated about the process.

jafaruddinlie commented 4 years ago

Yep, not a problem!

singular55 commented 4 years ago

I can confirm this problem as well. It seems to be an issue with the newer version builder only as I had a recipe that built successfully with the 2-5 builder. I moved to trying the 3-4-2 builder and could not get an error free build. When I found this issue I moved my recipe to the top directory of my GitHub repo and the build was successful.

vsoch commented 4 years ago

Yep thanks for reporting! I’ll be able to update the builder to address this bug for the next round of server work. In the meantime, your approach to move the recipe to root is what I suggest.

vsoch commented 4 years ago

hey @jafaruddinlie and @singular55 - I took a look at the server, and work to update for newer singularity 3.6.1, and that particular task is substantial enough that I'm going to wait for the larger server refactor closer to the winter. However, that doesn't mean that we can't fix this issue for the current latest version on shub, which is 3.4.2! So I have prepared a fixed image singularity-builder-v3-4-2 that should be able to handle the subfolder recipes that I plan to roll out for the next round of server work, which I've scheduled for two weeks from today, Friday August 21st. So - what I'll do then is make this image available as an option for your collection, and if it works for you, I'll ping you on here for you to test the images! If the test cases (building from root and from a subfolder) are good, we can remove the old builder and make this one default. Thanks in advance for your help! <3

singular55 commented 4 years ago

Sounds great, thanks for the heads up!

From: Vanessasaurus notifications@github.com Sent: Friday, August 7, 2020 6:38 PM To: singularityhub/singularityhub.github.io singularityhub.github.io@noreply.github.com Cc: Brosius, Kevin Kevin.Brosius@gryphontechnologies.com; Mention mention@noreply.github.com Subject: Re: [singularityhub/singularityhub.github.io] builder does not detect finished image in subfolder (#215)

hey @jafaruddinlie https://github.com/jafaruddinlie and @singular55 https://github.com/singular55 - I took a look at the server, and work to update for newer singularity 3.6.1, and that particular task is substantial enough that I'm going to wait for the larger server refactor closer to the winter. However, that doesn't mean that we can't fix this issue for the current latest version on shub, which is 3.4.2! So I have prepared a fixed image singularity-builder-v3-4-2 that should be able to handle the subfolder recipes that I plan to roll out for the next round of server work, which I've scheduled for two weeks from today, Friday August 21st. So - what I'll do then is make this image available as an option for your collection, and if it works for you, I'll ping you on here for you to test the images! If the test cases (building from root and from a subfolder) are good, we can remove the old builder and make this one default. Thanks in advance for your help! <3

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/singularityhub/singularityhub.github.io/issues/215#issuecomment-670771506 , or unsubscribe https://github.com/notifications/unsubscribe-auth/APLTFG3DCM3KUV4CDQV5NSDR7R645ANCNFSM4NSJKNSQ . https://github.com/notifications/beacon/APLTFG32IMIU6KPU7OYUOHDR7R645A5CNFSM4NSJKNS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOE75SSMQ.gif

vsoch commented 4 years ago

hey @jafaruddinlie and @singular55, I have a development builder for your to test! If you go to your collection settings, there should be a new entry (the last one in the list) for a builder with version 3.4.2 (no size mentioned). Could you please give this a test with 1) a recipe in the base of the repository (to make sure nothing was broken) and 2) a recipe in a subfolder? Please take your time!

singular55 commented 4 years ago

Thanks for the setup! My initial build in the root seems to have failed with not much in the way of a log. See attached. Not sure what to make if it.

vsoch commented 4 years ago

@singular55 I think you attached it to an email (which doesn't come through) could you show it here? I also deleted the various headers / other email bits that probably shouldn't be on here.

vsoch commented 4 years ago

Also could you please make sure to have debug turned on in your collection settings? It might not add additional info, but just in case, it can be helpful.

singular55 commented 4 years ago

Funny thing, when you say "show it here", that comes through in the email as well...

Here's a paste inline. Debug is already on.

Start Time: Fri Aug 28 15:21:03 UTC 2020.Cloning into '/tmp/tmpa5bl_laj'...warning: redirecting to https://github.com/singular55/container01.git/Switched to a new branch 'sing_353_shub'Branch 'sing_353_shub' set up to track remote branch 'sing_353_shub' from 'origin'.Return value of 137.Killed: Fri Aug 28 17:21:03 UTC 2020.

This branch builds with other builders, IIRC.

vsoch commented 4 years ago

You're right! The log was so small I thought it was email signature leftovers :) Thanks for the report, I'll take a look when I can clear up some time.

singular55 commented 4 years ago

Yeah, looks like it hung up right away. Path seems wrong in the GitHub link, now that I look. Thanks!

vsoch commented 4 years ago

@singular55 it looks like the build has an extensive setup section, and that there wasn't output / change for 2 hours. Do you have a simple recipe you could test?

singular55 commented 4 years ago

I'm working on a new one now that looks like it built successfully, I'll double check and see if that one works.

Normally the recipe I ran earlier is about a 20-25min build on shub, so 2 hours wouldn't be normal (I guess unless one of the wget's is non-responsive for some reason.)

singular55 commented 4 years ago

@vsoch Yes, it works with smaller recipe and both root and subdir!

vsoch commented 4 years ago

Awesome! 😎 Just curious, what is your reasoning to do so much of the build in %setup instead of %post? I think we might have seen the full output of the timeout or other issue if it was done there.

singular55 commented 4 years ago

Good question, I'm not really a Singularity expert, but I thought from my initial attempts that package installs that copied files from other outside sources only worked in %setup for me. Should they work after the yum install instructions in %post? I recall having problems with that and the wget transfer, extract steps.

jafaruddinlie commented 4 years ago

Hi @vsoch , can confirm both builds work fine with 3.4.2 that you set for us.

vsoch commented 4 years ago

Awesome @jafaruddinlie! @singular55 I’ll take a look at your recipe tomorrow and test building locally, and also give a shot at adding those sections to post. Have a good evening (morning? afternoon?) everyone!

vsoch commented 4 years ago

hey @singular55! I built your container like this:

``` Bootstrap:docker From:centos:7 %labels MAINTAINER singular55 %environment LANG=C.UTF-8 # couldn't change LC_ALL on target #LC_ALL=C.UTF-8 PATH=/bin_override:$PATH LIBRARY_PATH=/lib_override:$LIBRARY_PATH LD_LIBRARY_PATH=/lib_override:$LD_LIBRARY_PATH #WORKDIR=/work WRITEABLE=~/Container_Writeable #export LC_ALL LANG PATH LIBRARY_PATH LD_LIBRARY_PATH WORKDIR export LANG PATH LIBRARY_PATH LD_LIBRARY_PATH WRITEABLE %files eclipse.ini /eclipse.ini eclipse-parallel.ini /eclipse-parallel.ini %post mkdir -p /lib_override mkdir -p /bin_override #mkdir -p /work yum -y install epel-release yum repolist yum install -y git meld wget kdiff3 firefox # mysql uses libnuma yum install -y numactl-libs # fix some X / DBus issues? dbus-uuidgen > /var/lib/dbus/machine-id ## Eclipse for Scientific Computing # https://www.eclipse.org/downloads/download.php?file=/technology/epp/downloads/release/2020-03/R/eclipse-parallel-2020-03-R-linux-gtk-x86_64.tar.gz wget http://ftp.osuosl.org/pub/eclipse/technology/epp/downloads/release/2020-03/R/eclipse-parallel-2020-03-R-linux-gtk-x86_64.tar.gz -O eclipse-parallel.tar.gz tar -xf eclipse-parallel.tar.gz -C /bin_override rm eclipse-parallel.tar.gz # TODO - ini file cp /eclipse-parallel.ini /bin_override/eclipse/eclipse.ini mv /bin_override/eclipse /bin_override/eclipse-parallel # eclipse wget http://ftp.osuosl.org/pub/eclipse/technology/epp/downloads/release/2019-09/R/eclipse-jee-2019-09-R-linux-gtk-x86_64.tar.gz -O eclipse.tar.gz tar -xf eclipse.tar.gz -C /bin_override rm eclipse.tar.gz cp /eclipse.ini /bin_override/eclipse/ ## agraph # http://franz.com/ftp/pri/acl/ag/ag6.4.0/linuxamd64.64/agraph-6.4.0-linuxamd64.64.tar.gz wget http://franz.com/ftp/pri/acl/ag/ag6.4.0/linuxamd64.64/agraph-6.4.0-linuxamd64.64.tar.gz -O agraph.tar.gz tar -xf agraph.tar.gz -C /bin_override rm /agraph.tar.gz ## tomcat # https://archive.apache.org/dist/tomcat/tomcat-8/v8.5.47/bin/apache-tomcat-8.5.47.tar.gz wget https://archive.apache.org/dist/tomcat/tomcat-8/v8.5.47/bin/apache-tomcat-8.5.47.tar.gz -O tomcat.tar.gz tar -xf tomcat.tar.gz -C /bin_override # Apache installed as u:root g:root with no group or other permissions. For us to run apache from the # container we need other permissions, it looks like. chmod -R o+rx /bin_override/apache-tomcat-8.5.47 rm tomcat.tar.gz ## mysql - full install # https://dev.mysql.com/downloads/file/?id=495278 - login # https://dev.mysql.com/get/Downloads/MySQL-8.0/mysql-8.0.20-linux-glibc2.12-x86_64.tar.xz wget https://dev.mysql.com/get/Downloads/MySQL-8.0/mysql-8.0.20-linux-glibc2.12-x86_64.tar.xz -O mysql.tar.xz tar -xf mysql.tar.xz -C /bin_override rm mysql.tar.xz ## lite/minimal mysql wget https://dev.mysql.com/get/Downloads/MySQL-8.0/mysql-8.0.20-linux-x86_64-minimal.tar.xz -O mysql.tar.xz tar -xf mysql.tar.xz -C /bin_override rm mysql.tar.xz #eclipse jee ##wget http://ftp.osuosl.org/pub/eclipse/technology/epp/downloads/release/2019-09/R/eclipse-jee-2019-09-R-linux-gtk-x86_64.tar.gz -O /tmp/eclipse.tar.gz #tar -xf /tmp/eclipse.tar.gz -C /opt #tar -xf /tmp/eclipse.tar.gz -C ${SINGULARITY_ROOTFS} # ~ is /root for singularity hub ##tar -xf /tmp/eclipse.tar.gz -C ~ ##rm /tmp/eclipse.tar.gz #moved from /opt/eclipse ##cp eclipse.ini ~/eclipse/ %files #eclipse.ini /opt/eclipse/ eclipse.ini eclipse.ini eclipse-parallel.ini eclipse-parallel.ini %runscript #exec /bin/echo "Hi there, container runscript!" #exec /usr/bin/meld mkdir -p ${WRITEABLE} touch ${WRITEABLE}/HiThere /bin/echo "Config files should go in ${WRITEABLE}." %apprun meld exec meld "$@" %apprun firefox exec firefox "$@" %apprun eclipse exec /bin_override/eclipse/eclipse "$@" %apprun kdiff3 exec kdiff3 "$@" %apprun eclipse-parallel exec /bin_override/eclipse-parallel/eclipse "$@" ```

When I tested your recipe before change, it also exit with 127 because wget could not be found. I have it on my machine so likely it's not available to the build at that step.

Thanks to you both for testing this out! I'm going to close the issue - there is a lot of server work to do at the end of the year, but hopefully this should hold over until then.

singular55 commented 4 years ago

Thanks for the tips! I recall having some trouble like that in the past. I wonder if the newer 3.x versions of Singularity have changed behavior somewhat.

vsoch commented 4 years ago

I have seen changed behavior - it used to be that you could copy files from the host to /tmp, and have it work. I was testing old recipes from a few years back and the files were no longer found there, so I wonder if that could have been the issue. It also seems to be that the host's software is not accessible to the build, which makes sense because it could have escalated privileged and do something on the host.

singular55 commented 4 years ago

Yes, that's what I saw. I think that was the issue. In 2.5, I needed wget, and had to install it prior to use, but then I needed to use it as well. I think in the 2.5 builder I could not get wget to run in the same section after the 'yum install'. Sounds like an improvement if that works now. Thanks for that. I have not made changes to the recipe since starting to use 3.x versions of builder/HPC installs (which was only recently.)

piyanatk commented 4 years ago

Hi @vsoch. I am having the same issue as described in #221, which pointed me to here. I have read through the conversation but am still unsure what the solution is. Could you help me please? The definition file can be found here, and the builder log is attached below. I am simply trying to build a container with a bunch of Python packages. Both singularity-builder-3-4-2-100gb and singularity-builder-3-2-1-100gb-private give the same error, and for some unknown reasons the build took more than two hours was terminated if I use singularity-builder-v3-4-2.

singularity-build-log-HERA-Team_hera-rtp-singularity-Sept. 12, 2020, 10 10 a.m..txt

vsoch commented 4 years ago

You’ll want to use the last builder (the one you report taking more than 2 hours). If indeed that’s the case, it likely is low on memory when converting to sif and you won’t be able to use Singularity hub for such a large image.

piyanatk commented 4 years ago

Let me clarify. The build takes about 20 minutes on my laptop and the output image is about 1.4GB. The same definition file take extremely long time to build with singularity-builder-v3-4-2 on Singularity Hub. By the time that the build was terminated after two hours, it has not finished installing Ubuntu packages, which I found really strange.

vsoch commented 4 years ago

How much memory does your laptop have?

vsoch commented 4 years ago

And could you please include the log for the build that times out?

piyanatk commented 4 years ago

Here is the build log for the build that times out. singularity-build-log-HERA-Team_hera-rtp-singularity-Sept. 12, 2020, 3 40 p.m..txt

And my laptop has 8GB of memory.

vsoch commented 4 years ago

That should be comparable then! Let's try a few things to debug, because it shouldn't just hang like that:

The only insight that we have is that it's hanging on something, so we need to figure out that.

piyanatk commented 4 years ago

I have tried all of the above, including switching from Ubuntu to CentOS, with no success. The builds always stop somewhere while installing OS packages. Here is the debug log of the CentOS recipe.

Is it possible that there is an issue with my account? Are there some good definition files that are guarantee to build that I can use for testing.

vsoch commented 4 years ago

There aren't any differences between accounts - the only customization you can do is to specify the builder (which you already know about!)

The way I'd debug this is to start bare bones - literally just have your recipe like this:

Bootstrap: docker
From: centos:8

And then slowly add one command at a time. Your recipe is hugely complex, and what we need to do is figure out the exact line that is triggering the timeout. Once we know that, we'll have something to work with!

piyanatk commented 4 years ago

Hi. I did several more tests.

All definition files mentioned can be found in this repo and all tests were build with singularity-builder-v3-4-2 builder.

At this point I am very much dumbfounded by the inconsistency. Is there a way to track the build in more details in the debug log such as to make it print the executed command? I tried adding echo ... lines but it doesn't show in the build log.

vsoch commented 4 years ago

Did you verify it’s building the commit you think it’s building? You should see all print statements in the log (eg your echos).

singular55 commented 4 years ago

I have noticed that errors and operations seem to be interleaved in the build logs with the latest builder. At least from my experience it looks like a failure will occur higher in the log than the end of the log file, and installer operations will still be shown after that point until the build ends. Look for your echo statements to be embedded in the log earlier than you expected and see if you can find them.

piyanatk commented 4 years ago

@vsoch Hi. Yes, I double checked and it was building the right commit. However, the echo statements do not appear in the log.

vsoch commented 4 years ago

@piyanatk please point me to the exact commit (recipe) and the builder you are using, and I'll try to reproduce your error. I'm not sure how else to help.

piyanatk commented 4 years ago

@vsoch Please see for the recipe: https://github.com/HERA-Team/hera-rtp-singularity/blob/287f83e344e81462ba5bc1b5bf0d16593b788f8d/hera-rtp/Singularity.hera-rtp-ubuntu-conda2 And here is the error log that I got using singularity-builder-v3-4-2 builder. singularity-build-log-HERA-Team_hera-rtp-singularity-Sept. 20, 2020, 5 05 a.m.(1).txt

vsoch commented 4 years ago

hey @piyanatk - I've done the build with:

For the hanging builder, I don't insight for you here but there are records of this sort of thing happening - if you have insights for something I could try I'd be happy to, but figuring out the specifics is beyond the level of support that I can offer you, at least until we have many more reports of this issue.

Why do you not want to use any of the working builders? The older 3.4.2, as long as the recipe is in the root, works.

If you want to try the Sylabs library that's also an option, as is Google Cloud Build. You can also build a docker container and pull down to Singularity, either via Docker Hub or Quay.io (my goto choice typically). Good luck!

piyanatk commented 4 years ago

Hi @vsoch. Thank you for checking on this! I think when I tried the older builder, the image was not saved (exactly this issue I think), and you suggested that I used the newer builder. I will give 3.2.1 a try and will report back.

Unfortunately I do not have any insight as I am a novice on this myself. I am just trying to get a container build for the collaboration that I am involved with so that we can test it on a cluster.

vsoch commented 4 years ago

You might consider an automated build to Docker Hub or Quay, and then pulling down to Singularity, e.g., for the repository vanessa/salad

singularity pull docker://vanessa/salad

this is especially useful for development containers that warrant many builds a day, as Singularity Hub is more intended to build final / "I want to publish this" containers.