isaac-sim / IsaacLab

Unified framework for robot learning built on NVIDIA Isaac Sim
https://isaac-sim.github.io/IsaacLab
Other
2.25k stars 924 forks source link

[Bug] When Run in cluster FATAL: container creation failed: destination /mmfs1 doesn't exist in container #316

Closed zoctipus closed 1 month ago

zoctipus commented 8 months ago

I started with a clean orbit pulled from this repository followed documentation's guide downloaded

Docker version 24.0.2 Docker Compose version v2.18.1 apptainer version 1.3.0

Everything succeed until running

./docker/container.sh job --task Isaac-Velocity-Rough-Anymal-C-v0 --headless --video --offscreen_render

Returned:

sbatch: error: No account specified, defaulting to: cse
sbatch: error: No partition specified, defaulting to: compute
sbatch: error: Batch job submission failed: Invalid qos specification

Since this didn't work So what I did is that I login in to the cluster and mannually ran

sh ./docker/cluster/submit_job.sh ${CLUSTER_ORBIT_DIR} --task Isaac-Velocity-Rough-Anymal-C-v0 --headless --video --offscreen_render

Job submission succeeded, but the output shows

FATAL:   container creation failed: mount hook function failure: mount /var/apptainer/mnt/session/mmfs1->/mmfs1 error: while mounting /var/apptainer/mnt/session/mmfs1: destination /mmfs1 doesn't exist in container

Steps to reproduce

following the cluster guide with a clean orbit install.

Running

./docker/container.sh job --task Isaac-Velocity-Rough-Anymal-C-v0 --headless --video --offscreen_render

Returned:

sbatch: error: No account specified, defaulting to: cse
sbatch: error: No partition specified, defaulting to: compute
sbatch: error: Batch job submission failed: Invalid qos specification

Or Running

sh ./docker/cluster/submit_job.sh ${CLUSTER_ORBIT_DIR} --task Isaac-Velocity-Rough-Anymal-C-v0 --headless --video --offscreen_render

returned

(run_singularity.py): Called on compute node with arguments --task Isaac-Velocity-Rough-Anymal-C-v0 --headless --video --offscreen_render
WARNING: nv files may not be bound with --writable
WARNING: By using --writable, Apptainer can't create /mmfs1 destination automatically without overlay or underlay
FATAL:   container creation failed: mount hook function failure: mount /var/apptainer/mnt/session/mmfs1->/mmfs1 error: while mounting /var/apptainer/mnt/session/mmfs1: destination /mmfs1 doesn't exist in container

-->

System Info

Describe the characteristic of your environment:

ACCEPT_EULA=Y

ISAACSIM_VERSION=2023.1.1

DOCKER_ISAACSIM_PATH=/isaac-sim

DOCKER_USER_HOME=/root

CLUSTER_ISAAC_SIM_CACHE_DIR=/path/to/docker-isaac-sim

CLUSTER_ORBIT_DIR=/path/to/orbit

CLUSTER_LOGIN=...........edu

CLUSTER_SIF_PATH=/path/to/sif_path/

CLUSTER_PYTHON_EXECUTABLE=source/standalone/workflows/rsl_rl/train.py

Checklist

Acceptance Criteria

Add the criteria for which this task is considered done. If not known at issue creation time, you can add this once the issue is assigned.

Mayankm96 commented 7 months ago

@pascal-roth Any idea here?

pascal-roth commented 7 months ago

This looks like an Apptainer and Docker version issue. Can you try to use apptainer version 1.2.5-1.el7 and docker version 24.0.7 on the system where you build the singularity file?