singularityhub / singularityhub.github.io

Container tools for scientific computing! Docs at https://singularityhub.github.io/singularityhub-docs
https://singularityhub.github.io
68 stars 9 forks source link

Container Stuck at Running #154

Closed fertinaz closed 5 years ago

fertinaz commented 6 years ago

Link to Container Collection Log, Build, or Collection (in that order)

Collection: https://www.singularity-hub.org/collections/1859

Behavior when Building Locally

It builds successfully on my CentOS-7 workstation which has 4 cores and 16GB memory.

But this is a relatively complex and time consuming build, because it compiles a large CFD package and its dependencies from scratch.

Error on Singularity Hub

It is stuck at "Running" state after 2-3 days.

What do you think is going on?

I've seen similar issues for some large build recipes. Perhaps this is hanging at some point but I don't know where exactly. Can you help me out with that?

Thank you

// Fatih

vsoch commented 6 years ago

If the image is too big, the instance will get overwhelmed (and Google) automatically kills it. The signal can't be sent back to Singularity Hub despite the kill, so it only looks like it's running. You should assess the final size of your image, and then account for the following:

It looks like you are downloading and compiling a LOT and it's hitting the limits of disk space at some point. There are several things you can try!

I'd do a debug run (locally) where you can print out the sizes of final directories (for each application) and then the final image. That should be a good start.

fertinaz commented 6 years ago

Hello

Thanks for the tips. I will try a cleaner version. Is there a way to follow the log files on the Google side?

// Fatih

vsoch commented 6 years ago

I can't give anyone direct access to the instance, but If you give me a heads up when it's running, I can shell into the instance and monitor.

fertinaz commented 6 years ago

I guess I found out the root-cause.

Probably parallel compilation swaps if

cat /proc/cpuinfo | grep "processor" | wc -l

returns number of threads rather than the physical cores.

I didn't check the size of temp files generated on-the-fly but final image is around 750 MB. So, I don't think this is a disk issue.

Now I switched to singularity-3 and applying a serial compilation, hopefully this will make a difference. I will let you know once my local test finishes.

Thank you.

fertinaz commented 6 years ago

So, I've just triggered a new build after my latest commit. Not much has changed anyway.

Can you let me know if it hangs?

vsoch commented 6 years ago

okay the one you triggered was already dead - I've triggered a new one and I'm going to ssh in, so please hold tight on pushing / changing the collection.

vsoch commented 6 years ago

How long does this normally take? I'm sitting here watching it still...

fertinaz commented 6 years ago

Should take about 3-4 hours, I don't suggest watching it at all...

vsoch commented 6 years ago

3-4 hours? Wait, you know that the builder limit is 2 hours right? https://github.com/singularityhub/singularity-python/blob/master/singularity/build/scripts/singularity-build-latest.sh#L30

There might be a kill signal in there regardless (if you did start when you mentioned, for example it was already gone at 44 minutes) but generally a 3-4 hour build is not something Singularity Hub currnetly supports.

fertinaz commented 6 years ago

Oh, okay I didn't know the 2 hours limit. I was just aware of the builder specifications provided in the wiki page. That explains. Thank you

vsoch commented 6 years ago

Good point, I will add that right away!

vsoch commented 6 years ago

Okay added to docs! Actually this is very good news - it means that you can build a base image (I usually use docker but you could do shub do) that does a big chunk of the compiling, and then build the final image on top of that. Do you want to try that?

fertinaz commented 6 years ago

That's a good idea. Main package compilation is the most time consuming part, but compiling third party tools also takes considerable amount of time. I will try that.

vsoch commented 5 years ago

Hey @fertinaz are you all set here?

vsoch commented 5 years ago

Non response, closing issue. @fertinaz I apologize that this didn't work, and hopefully if/when we update the builders it will resolve.