Closed marleneberke closed 4 years ago
I'm using the gpu_devel node: srun --pty -p gpu_devel -c 2 -t 2:00:00 --gres=gpu:1 bash
and nvidia-smi gives this: +-----------------------------------------------------------------------------+ | NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla K80 On | 00000000:84:00.0 Off | 0 | | N/A 33C P8 29W / 149W | 0MiB / 11441MiB | 0% Default | +-------------------------------+----------------------+----------------------+
@mdb293 ,
can you post the contents of run.sh
?
CONT="$PWD/detectron2.sif" CONDA_ENV="$PWD/detectron2_env" COMMAND="$@"
SING_EXEC="../docker-singularity-master/singularity.sh" $SING_EXEC exec $CONT bash -c "source activate $PWD/$CONDA_ENV \ && $COMMAND"
so what you posted seems to the wrong run.sh
but i'm assuming the only major difference is defining SING_EXEC
.
The problem you have having is that the container does not know where the nvidia drivers are and thus you cannot using the gpu from inside the container.
for example: singularity exec detectron2.sif nvidia-smi
should fail
you need to bind the drivers, singularity has some tricks to do this.
all you need to do is pass the --nv
flag as in singularity exec --nv detectron2.sif nvidia-smi
Oops this is the correct run.sh on the server:
CONT="$PWD/detectron2.sif" CONDA_ENV="$PWD/detectron2_env" COMMAND="$@"
SING_EXEC="singularity" $SING_EXEC exec $CONT bash -c "source activate $CONDA_ENV \ && $COMMAND"
If I understand correctly, I should change run.sh to
CONT="$PWD/detectron2.sif" CONDA_ENV="$PWD/detectron2_env" COMMAND="$@"
SING_EXEC="singularity" $SING_EXEC exec --nv $CONT nvidia-smi bash -c "source activate $CONDA_ENV && $COMMAND"
sorry just saw this. not quite.
#!/bin/bash
CONT="$PWD/detectron2.sif"
CONDA_ENV="$PWD/detectron2_env"
COMMAND="$@"
SING_EXEC="singularity"
$SING_EXEC exec --nv $CONT bash -c "source activate $CONDA_ENV
&& $COMMAND"
[mb2987@c22n01 maskRCNN_singularity]$ ./run.sh python detectron2/demo/demo.py --config-file detectron2/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml \
[01/17 16:25:05 detectron2]: Arguments: Namespace(confidence_threshold=0.5, config_file='detectron2/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml', input=['detectron2/Left_side_of_Flying_Pigeon.jpg', '[--other-options]'], opts=[], output=None, video_input=None, webcam=False) Traceback (most recent call last): File "detectron2/demo/demo.py", line 73, in
demo = VisualizationDemo(cfg)
File "/gpfs/loomis/home.grace/jara-ettinger/mb2987/maskRCNN_singularity/detectron2/demo/predictor.py", line 35, in init
self.predictor = DefaultPredictor(cfg)
File "/home/jara-ettinger/mb2987/maskRCNN_singularity/detectron2/detectron2/engine/defaults.py", line 157, in init
self.model = build_model(self.cfg)
File "/home/jara-ettinger/mb2987/maskRCNN_singularity/detectron2/detectron2/modeling/meta_arch/build.py", line 19, in build_model
return META_ARCH_REGISTRY.get(meta_arch)(cfg)
File "/home/jara-ettinger/mb2987/maskRCNN_singularity/detectron2/detectron2/modeling/meta_arch/rcnn.py", line 42, in init
pixel_mean = torch.Tensor(cfg.MODEL.PIXEL_MEAN).to(self.device).view(num_channels, 1, 1)
File "/home/jara-ettinger/mb2987/maskRCNN_singularity/detectron2_env/lib/python3.6/site-packages/torch/cuda/init.py", line 192, in _lazy_init
_check_driver()
File "/home/jara-ettinger/mb2987/maskRCNN_singularity/detectron2_env/lib/python3.6/site-packages/torch/cuda/init.py", line 102, in _check_driver
http://www.nvidia.com/Download/index.aspx""")
AssertionError:
Found no NVIDIA driver on your system. Please check that you
have an NVIDIA GPU and installed a driver from
http://www.nvidia.com/Download/index.aspx