Closed jafaruddinlie closed 4 years ago
Could you please attach the entire log here as a text file?
I don’t see the end message about the finish time which concerns me - perhaps the builder was killed because of memory or filesystem limits. Could you please run again and:
And then again include the log here! It’s after midnight so I’m off to bed but I’ll take a look tomorrow. Err, today but later :)
Hope you had a good night sleep! Here's the log as requested (debug mode turned on)
I didn't manage to note down the time it takes to build on shub, but on local machine it is around 30 minutes, size of the container 3.7GB, and memory used during the build around 1GB.
singularity-build-log-jafaruddinlie_shub-June 4, 2020, 2_35 a.m..txt )
Thank you! The build time looks ok, similar to on your host:
Start Time: Thu Jun 4 07:07:44 UTC 2020.
End Time: Thu Jun 4 07:35:30 UTC 2020
I'll need to debug this interactively, if not tomorrow then over the weekend (it's already after dinner time the next day since you originally posted, so time to relax!)
hey @jafaruddinlie ! I've tested your build, and the container does complete successfully. There are two issues that I found (that we can debug separately) to figure out which is leading to the failure. The first is the container test. To test the container, we execute the "ls" command. However, your container doesn't seem to have an ls:
Singularity 64ae85d53aaa966cc99ad7793127893992a82bf8617936fb0923c3aaa6270919.sif:/> ls
bash: ls: command not found
So even if there is an issue before that, the test would fail at this step. Do you know why your image doesn't have ls? The next issue is a potential bug with the path (oy vey) for the builder, and I'm not sure why it hasn't happened before. Your resulting image gets placed in the same directory as the recipe file, but is looked for one above it. So - to test this would you mind trying to put the recipe one folder up (in the root of the repo). If that turns out to be the issue, and you figure out the test command with ls to get it running, then I'll need to create a new builder with the fix (and then ask you to test). In summary:
Thanks!
Both logs are attached.
singularity-build-log-jafaruddinlie_shub-vardict_notroot_dir.txt singularity-build-log-jafaruddinlie_shub-vardict_rootdir..txt
Great! So we know the exact bug now. I update builders when there is a major release of Singularity and it coincides with a server restart, so unfortunately that won't be any time soon. I'll rename this issue to be with respect to the image path not detected when from a subfolder, and in the meantime you'll have to do builds with a recipe in the root. Thanks for reporting this issue!
@jafaruddinlie if you could, might we be able to keep the recipe around for use when I develop the builder? I can ping you again when that time comes to keep you updated about the process.
Yep, not a problem!
I can confirm this problem as well. It seems to be an issue with the newer version builder only as I had a recipe that built successfully with the 2-5 builder. I moved to trying the 3-4-2 builder and could not get an error free build. When I found this issue I moved my recipe to the top directory of my GitHub repo and the build was successful.
Yep thanks for reporting! I’ll be able to update the builder to address this bug for the next round of server work. In the meantime, your approach to move the recipe to root is what I suggest.
hey @jafaruddinlie and @singular55 - I took a look at the server, and work to update for newer singularity 3.6.1, and that particular task is substantial enough that I'm going to wait for the larger server refactor closer to the winter. However, that doesn't mean that we can't fix this issue for the current latest version on shub, which is 3.4.2! So I have prepared a fixed image singularity-builder-v3-4-2
that should be able to handle the subfolder recipes that I plan to roll out for the next round of server work, which I've scheduled for two weeks from today, Friday August 21st. So - what I'll do then is make this image available as an option for your collection, and if it works for you, I'll ping you on here for you to test the images! If the test cases (building from root and from a subfolder) are good, we can remove the old builder and make this one default. Thanks in advance for your help! <3
Sounds great, thanks for the heads up!
From: Vanessasaurus notifications@github.com Sent: Friday, August 7, 2020 6:38 PM To: singularityhub/singularityhub.github.io singularityhub.github.io@noreply.github.com Cc: Brosius, Kevin Kevin.Brosius@gryphontechnologies.com; Mention mention@noreply.github.com Subject: Re: [singularityhub/singularityhub.github.io] builder does not detect finished image in subfolder (#215)
hey @jafaruddinlie https://github.com/jafaruddinlie and @singular55 https://github.com/singular55 - I took a look at the server, and work to update for newer singularity 3.6.1, and that particular task is substantial enough that I'm going to wait for the larger server refactor closer to the winter. However, that doesn't mean that we can't fix this issue for the current latest version on shub, which is 3.4.2! So I have prepared a fixed image singularity-builder-v3-4-2 that should be able to handle the subfolder recipes that I plan to roll out for the next round of server work, which I've scheduled for two weeks from today, Friday August 21st. So - what I'll do then is make this image available as an option for your collection, and if it works for you, I'll ping you on here for you to test the images! If the test cases (building from root and from a subfolder) are good, we can remove the old builder and make this one default. Thanks in advance for your help! <3
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/singularityhub/singularityhub.github.io/issues/215#issuecomment-670771506 , or unsubscribe https://github.com/notifications/unsubscribe-auth/APLTFG3DCM3KUV4CDQV5NSDR7R645ANCNFSM4NSJKNSQ . https://github.com/notifications/beacon/APLTFG32IMIU6KPU7OYUOHDR7R645A5CNFSM4NSJKNS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOE75SSMQ.gif
hey @jafaruddinlie and @singular55, I have a development builder for your to test! If you go to your collection settings, there should be a new entry (the last one in the list) for a builder with version 3.4.2 (no size mentioned). Could you please give this a test with 1) a recipe in the base of the repository (to make sure nothing was broken) and 2) a recipe in a subfolder? Please take your time!
Thanks for the setup! My initial build in the root seems to have failed with not much in the way of a log. See attached. Not sure what to make if it.
@singular55 I think you attached it to an email (which doesn't come through) could you show it here? I also deleted the various headers / other email bits that probably shouldn't be on here.
Also could you please make sure to have debug turned on in your collection settings? It might not add additional info, but just in case, it can be helpful.
Funny thing, when you say "show it here", that comes through in the email as well...
Here's a paste inline. Debug is already on.
Start Time: Fri Aug 28 15:21:03 UTC 2020.Cloning into '/tmp/tmpa5bl_laj'...warning: redirecting to https://github.com/singular55/container01.git/Switched to a new branch 'sing_353_shub'Branch 'sing_353_shub' set up to track remote branch 'sing_353_shub' from 'origin'.Return value of 137.Killed: Fri Aug 28 17:21:03 UTC 2020.
This branch builds with other builders, IIRC.
You're right! The log was so small I thought it was email signature leftovers :) Thanks for the report, I'll take a look when I can clear up some time.
Yeah, looks like it hung up right away. Path seems wrong in the GitHub link, now that I look. Thanks!
@singular55 it looks like the build has an extensive setup section, and that there wasn't output / change for 2 hours. Do you have a simple recipe you could test?
I'm working on a new one now that looks like it built successfully, I'll double check and see if that one works.
Normally the recipe I ran earlier is about a 20-25min build on shub, so 2 hours wouldn't be normal (I guess unless one of the wget's is non-responsive for some reason.)
@vsoch Yes, it works with smaller recipe and both root and subdir!
Awesome! 😎 Just curious, what is your reasoning to do so much of the build in %setup instead of %post? I think we might have seen the full output of the timeout or other issue if it was done there.
Good question, I'm not really a Singularity expert, but I thought from my initial attempts that package installs that copied files from other outside sources only worked in %setup for me. Should they work after the yum install instructions in %post? I recall having problems with that and the wget transfer, extract steps.
Hi @vsoch , can confirm both builds work fine with 3.4.2 that you set for us.
Awesome @jafaruddinlie! @singular55 I’ll take a look at your recipe tomorrow and test building locally, and also give a shot at adding those sections to post. Have a good evening (morning? afternoon?) everyone!
hey @singular55! I built your container like this:
When I tested your recipe before change, it also exit with 127 because wget could not be found. I have it on my machine so likely it's not available to the build at that step.
Thanks to you both for testing this out! I'm going to close the issue - there is a lot of server work to do at the end of the year, but hopefully this should hold over until then.
Thanks for the tips! I recall having some trouble like that in the past. I wonder if the newer 3.x versions of Singularity have changed behavior somewhat.
I have seen changed behavior - it used to be that you could copy files from the host to /tmp, and have it work. I was testing old recipes from a few years back and the files were no longer found there, so I wonder if that could have been the issue. It also seems to be that the host's software is not accessible to the build, which makes sense because it could have escalated privileged and do something on the host.
Yes, that's what I saw. I think that was the issue. In 2.5, I needed wget, and had to install it prior to use, but then I needed to use it as well. I think in the 2.5 builder I could not get wget to run in the same section after the 'yum install'. Sounds like an improvement if that works now. Thanks for that. I have not made changes to the recipe since starting to use 3.x versions of builder/HPC installs (which was only recently.)
Hi @vsoch. I am having the same issue as described in #221, which pointed me to here. I have read through the conversation but am still unsure what the solution is. Could you help me please? The definition file can be found here, and the builder log is attached below. I am simply trying to build a container with a bunch of Python packages. Both singularity-builder-3-4-2-100gb
and singularity-builder-3-2-1-100gb-private
give the same error, and for some unknown reasons the build took more than two hours was terminated if I use singularity-builder-v3-4-2
.
singularity-build-log-HERA-Team_hera-rtp-singularity-Sept. 12, 2020, 10 10 a.m..txt
You’ll want to use the last builder (the one you report taking more than 2 hours). If indeed that’s the case, it likely is low on memory when converting to sif and you won’t be able to use Singularity hub for such a large image.
Let me clarify. The build takes about 20 minutes on my laptop and the output image is about 1.4GB. The same definition file take extremely long time to build with singularity-builder-v3-4-2
on Singularity Hub. By the time that the build was terminated after two hours, it has not finished installing Ubuntu packages, which I found really strange.
How much memory does your laptop have?
And could you please include the log for the build that times out?
Here is the build log for the build that times out. singularity-build-log-HERA-Team_hera-rtp-singularity-Sept. 12, 2020, 3 40 p.m..txt
And my laptop has 8GB of memory.
That should be comparable then! Let's try a few things to debug, because it shouldn't just hang like that:
apt-get clean
- in case this removes something needed in /tmp.docker
instead of library
The only insight that we have is that it's hanging on something, so we need to figure out that.
I have tried all of the above, including switching from Ubuntu to CentOS, with no success. The builds always stop somewhere while installing OS packages. Here is the debug log of the CentOS recipe.
Is it possible that there is an issue with my account? Are there some good definition files that are guarantee to build that I can use for testing.
There aren't any differences between accounts - the only customization you can do is to specify the builder (which you already know about!)
The way I'd debug this is to start bare bones - literally just have your recipe like this:
Bootstrap: docker
From: centos:8
And then slowly add one command at a time. Your recipe is hugely complex, and what we need to do is figure out the exact line that is triggering the timeout. Once we know that, we'll have something to work with!
Hi. I did several more tests.
apt-get update
also build fine as it should be. apt-get install
give confusing errors
git wget
(and vim
) were install. osonly-test loggit
or wget
was installed. I did not try installing other packages. git-test logwget ...
line after apt-get install git wget vim
times out with return code 137, but I am unsure if the cause is wget
because the debug log seems to terminate during apt-get install
. conda-test logAll definition files mentioned can be found in this repo and all tests were build with singularity-builder-v3-4-2
builder.
At this point I am very much dumbfounded by the inconsistency. Is there a way to track the build in more details in the debug log such as to make it print the executed command? I tried adding echo ...
lines but it doesn't show in the build log.
Did you verify it’s building the commit you think it’s building? You should see all print statements in the log (eg your echos).
I have noticed that errors and operations seem to be interleaved in the build logs with the latest builder. At least from my experience it looks like a failure will occur higher in the log than the end of the log file, and installer operations will still be shown after that point until the build ends. Look for your echo statements to be embedded in the log earlier than you expected and see if you can find them.
@vsoch Hi. Yes, I double checked and it was building the right commit. However, the echo statements do not appear in the log.
@piyanatk please point me to the exact commit (recipe) and the builder you are using, and I'll try to reproduce your error. I'm not sure how else to help.
@vsoch Please see for the recipe: https://github.com/HERA-Team/hera-rtp-singularity/blob/287f83e344e81462ba5bc1b5bf0d16593b788f8d/hera-rtp/Singularity.hera-rtp-ubuntu-conda2
And here is the error log that I got using singularity-builder-v3-4-2
builder.
singularity-build-log-HERA-Team_hera-rtp-singularity-Sept. 20, 2020, 5 05 a.m.(1).txt
hey @piyanatk - I've done the build with:
For the hanging builder, I don't insight for you here but there are records of this sort of thing happening - if you have insights for something I could try I'd be happy to, but figuring out the specifics is beyond the level of support that I can offer you, at least until we have many more reports of this issue.
Why do you not want to use any of the working builders? The older 3.4.2, as long as the recipe is in the root, works.
If you want to try the Sylabs library that's also an option, as is Google Cloud Build. You can also build a docker container and pull down to Singularity, either via Docker Hub or Quay.io (my goto choice typically). Good luck!
Hi @vsoch. Thank you for checking on this! I think when I tried the older builder, the image was not saved (exactly this issue I think), and you suggested that I used the newer builder. I will give 3.2.1 a try and will report back.
Unfortunately I do not have any insight as I am a novice on this myself. I am just trying to get a container build for the collaboration that I am involved with so that we can test it on a cluster.
You might consider an automated build to Docker Hub or Quay, and then pulling down to Singularity, e.g., for the repository vanessa/salad
singularity pull docker://vanessa/salad
this is especially useful for development containers that warrant many builds a day, as Singularity Hub is more intended to build final / "I want to publish this" containers.
Links
Version of Singularity
Local: 3.5.3 shub: singularity-builder-3.4.1-100GB
Behavior when Building Locally
Builds fine.
Error on Singularity Hub
The build looks OK but exited with this error: [34mINFO: [0m Adding labels [33mWARNING:[0m Label: APPLICATION_NAME already exists and force option is false, not overwriting [33mWARNING:[0m Label: APPLICATION_VERSION already exists and force option is false, not overwriting [33mWARNING:[0m Label: MAINTAINER_NAME already exists and force option is false, not overwriting [33mWARNING:[0m Label: MAINTAINER_EMAIL already exists and force option is false, not overwriting [34mINFO: [0m Adding environment to container [34mINFO: [0m Adding runscript [34mINFO: [0m Creating SIF file... [34mINFO: [0m Build complete: /root/build/container.sif ERROR Final image does not exist.
What do you think is going on?
Not really sure, version of Singularity not supported?