Closed tothuhien closed 2 years ago
Not only those 2 tools, I have tested on a couple of other tools and all ended with the same errors.
It looks like all the failed jobs ran on the ecc1 node. I drained that node and started another job which then worked on a different node
Interestingly, the slurm-node, nrec2 and ecc3 run singularity version 3.8.7-1.el7, but ecc1 and ecc2 only have version 1.1.0-1.el7
I don't know anything about this, but just google the error and found this one. One of the comment is: Unfortunately, as is usually the case any time it is involved, NFS is a likely cause for your problem. Singularity relies on SUID for its core functionality, but for (quite good) security reasons SUID is disabled on NFS by default. It is unlikely the cluster admins will enable that option, so your best bet is to ask them install it locally on whichever computer/interactive nodes you need it on. As this error happened right after some updates of NFS on the system, I'm wondering if it's related.
Contrary to my previous belief, the ecc1 and ecc2 nodes were actually running a newer version of Singularity than the other nodes, since the version numbering was restarted when Singularity was renamed to Apptainer.
I tried to manually run a Singularity container on the ecc1 node and got the following error message:
INFO : A system administrator may need to enable user namespaces, install
INFO : apptainer-suid, or compile with ./mconfig --with-suid
ERROR : Failed to create user namespace: user namespace disabled
Following the advice in this message, I installed "apptainer-suid" with sudo yum install apptainer-suid
, and thereafter I was able to successfully execute container commands. I did the same installation on ecc2 and resumed both nodes in Slurm.
Tools minimap and LASTZ ended with error message:
[33mWARNING:[0m DEPRECATED USAGE: Environment variable SINGULARITY_TMPDIR will not be supported in the future, use APPTAINER_TMPDIR instead [91mERROR : Failed to create user namespace: user namespace disabled [0m
Those tools used to work before. I can reproduce the error.