marleneberke / ORB_project3

0 stars 1 forks source link

nvidia error on cluster #10

Closed marleneberke closed 4 years ago

marleneberke commented 4 years ago

[mb2987@c22n01 maskRCNN_singularity]$ ./run.sh python detectron2/demo/demo.py --config-file detectron2/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml \

--input detectron2/Left_side_of_Flying_Pigeon.jpg \ [--other-options]

[01/17 16:25:05 detectron2]: Arguments: Namespace(confidence_threshold=0.5, config_file='detectron2/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml', input=['detectron2/Left_side_of_Flying_Pigeon.jpg', '[--other-options]'], opts=[], output=None, video_input=None, webcam=False) Traceback (most recent call last): File "detectron2/demo/demo.py", line 73, in demo = VisualizationDemo(cfg) File "/gpfs/loomis/home.grace/jara-ettinger/mb2987/maskRCNN_singularity/detectron2/demo/predictor.py", line 35, in init self.predictor = DefaultPredictor(cfg) File "/home/jara-ettinger/mb2987/maskRCNN_singularity/detectron2/detectron2/engine/defaults.py", line 157, in init self.model = build_model(self.cfg) File "/home/jara-ettinger/mb2987/maskRCNN_singularity/detectron2/detectron2/modeling/meta_arch/build.py", line 19, in build_model return META_ARCH_REGISTRY.get(meta_arch)(cfg) File "/home/jara-ettinger/mb2987/maskRCNN_singularity/detectron2/detectron2/modeling/meta_arch/rcnn.py", line 42, in init pixel_mean = torch.Tensor(cfg.MODEL.PIXEL_MEAN).to(self.device).view(num_channels, 1, 1) File "/home/jara-ettinger/mb2987/maskRCNN_singularity/detectron2_env/lib/python3.6/site-packages/torch/cuda/init.py", line 192, in _lazy_init _check_driver() File "/home/jara-ettinger/mb2987/maskRCNN_singularity/detectron2_env/lib/python3.6/site-packages/torch/cuda/init.py", line 102, in _check_driver http://www.nvidia.com/Download/index.aspx""") AssertionError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

marleneberke commented 4 years ago

I'm using the gpu_devel node: srun --pty -p gpu_devel -c 2 -t 2:00:00 --gres=gpu:1 bash

and nvidia-smi gives this: +-----------------------------------------------------------------------------+ | NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla K80 On | 00000000:84:00.0 Off | 0 | | N/A 33C P8 29W / 149W | 0MiB / 11441MiB | 0% Default | +-------------------------------+----------------------+----------------------+

belledon commented 4 years ago

@mdb293 , can you post the contents of run.sh?

marleneberke commented 4 years ago

!/bin/bash

CONT="$PWD/detectron2.sif" CONDA_ENV="$PWD/detectron2_env" COMMAND="$@"

SING_EXEC="../docker-singularity-master/singularity.sh" $SING_EXEC exec $CONT bash -c "source activate $PWD/$CONDA_ENV \ && $COMMAND"

belledon commented 4 years ago

so what you posted seems to the wrong run.sh but i'm assuming the only major difference is defining SING_EXEC.

The problem you have having is that the container does not know where the nvidia drivers are and thus you cannot using the gpu from inside the container.

for example: singularity exec detectron2.sif nvidia-smi should fail

you need to bind the drivers, singularity has some tricks to do this.

all you need to do is pass the --nv flag as in singularity exec --nv detectron2.sif nvidia-smi

marleneberke commented 4 years ago

Oops this is the correct run.sh on the server:

!/bin/bash

CONT="$PWD/detectron2.sif" CONDA_ENV="$PWD/detectron2_env" COMMAND="$@"

SING_EXEC="singularity" $SING_EXEC exec $CONT bash -c "source activate $CONDA_ENV \ && $COMMAND"

marleneberke commented 4 years ago

If I understand correctly, I should change run.sh to

!/bin/bash

CONT="$PWD/detectron2.sif" CONDA_ENV="$PWD/detectron2_env" COMMAND="$@"

SING_EXEC="singularity" $SING_EXEC exec --nv $CONT nvidia-smi bash -c "source activate $CONDA_ENV && $COMMAND"

belledon commented 4 years ago

sorry just saw this. not quite.

#!/bin/bash
CONT="$PWD/detectron2.sif"
CONDA_ENV="$PWD/detectron2_env"
COMMAND="$@"

SING_EXEC="singularity"
$SING_EXEC exec --nv $CONT bash -c "source activate $CONDA_ENV
&& $COMMAND"