Closed Majolund closed 3 years ago
@Majolund Wow, I didn't know that's the case. To me it sounds like a real major issue, as it's difficult to develop & debug without running stuff interactively. It really just my guess but I would imaging it was one of the reasons to enable modules
environment on the login nodes, so that we could interactively develop our scripts locally before submitting large-scale jobs to the cluster.
I think it may be a question of VM. it worked for me on tl02.
A bit new info on this. The issue is related to the fact that the /cluster filesystem is exported with 'nosuid' set and Singularity depends on its setuid binary.
https://singularity.lbl.gov/admin-guide
To ensure proper support, it is easiest to make sure you install Singularity to a local file system.
For app nodes, the recommendation is to install locally Singularity from EPEL. This way it should be ok to run Singularity containers on appnodes.
Out of curiosity, I think we should we try to copy singularity module into a local folder on p33-submit VM (if it has some local folder - now sure). In principle this should avoid 'nosuid' issue. But I don't know if singularity runs on VMs. I know virtual machines within virtual machines don't work - e.g. can't enable HyperV within Windows Server that itself runs on Hyper V. Not sure if this also applies to singularity.
Thanks for the update and info! Looking forward to when the app nodes are available, they ‘ll be a great resource. That’s a good idea, when running singularity on Mac this is by use of VM so think it’s worth a try. Is Sabry the go-to person to ask when it comes to possible local folders?
@Majolund Could you first try singularity on our tl02
VM, as suggested by @scimerc ? If you need help to get singularity to work on p697-submit
or on our RHEL login node - yes, please ask Sabry.
Yes, it seems to work on tl02 VM, thanks for the tip @scimerc
@Majolund Could you add some details, e.g. how it is that Singularity software is available on tl02? I think tl02 doesn't have access to cluster, and singularity is a module on cluster. Does tl02 has it's own singularity installed? I'm a bit confused here. Are there some links to relevant TSD documentation?
Seems like it has its own singularity installed and that it may also be available through /cluster/etc/modulefiles. I haven't found any documentation so far for this.
tl02 runs version 3.5.3-1.1.el6 modules aren't currently accessible but I remember Adriano was complaining about older versions of singularity a while back.
There are few issues with singularity on TSD, but I found it to be usable.
Issues 1: This happens on Colossus in a SLURM job where I "module load singularity/3.5.2" If instead I use "module load singularity/2.6.1" then everything works fine.
Issue 2: There is a local installation of a singularity in /usr/bin in our p33-submit machine, and it works well. However, out of curiosity, I've tried to "module load singularity/3.5.2" and run the same command as in (1) - this fails. If I use "module load singularity/2.6.1" it fails with another error.
Two tips to avoid issues:
We've looked further into Singularity containers on TSD with Adriano and discovered that /tsd/p33/data/home is by default mounted within container as home folder. This is not TSD-specific, but rather a default behavior of the singularity - the home folder is, by default, mounted into container. This may lead to weird behavior, e.g. if you have software within your home folder it may interfer with software installed within singularity. For this reason I do recommend to run singularit with --no-home
argument.
Great, thanks for the information and looking into this!
https://github.com/norment/tsd_issues/issues/47 has some further info about singularity containers that attempt to write data locally. But default this doesn't work - singularity container is read-only. If software writes anything (even a log file that is not important to you) then sit may still crash with an error saying something like read only file system
.
We need Singularity 3.5 or 3.6 on Colossus. @Sandeek Could you please follow up with TSD team about this issue?
yes @ofrei noted
@Sandeek Thank you! I've submitted RT ticket 4147986 to track this request.
singularity 3.6.4
will be installed directly on all Colossus nodes.
This is solved - singularity 3.7 is installed on all app nodes & available as a module for SLURM jobs.
It would be great if it was possible to run singularity in the terminal window as it would be easier to do tests or run it for small jobs.
This has been sent as a ticket to TSD (in October 2019) where the response from TSD was that it was due to the tl nodes being old OS and kernel, and that to avoid this, the submit node should be used, but this did not work and didn't hear back after sending the error messages, so not sure what the status is/if it's closed?
This is a minor issue, but would be good to solve.