Docker image is not working! #149

Closed Nazila-H closed 1 year ago

Nazila-H commented 1 year ago

Thanks a lot for the very helpful project.

Describe the error The nvcc -V result is:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Jun__8_16:49:14_PDT_2022
Cuda compilation tools, release 11.7, V11.7.99
Build cuda_11.7.r11.7/compiler.31442593_0 

and nvidia-smi result is:


I used the provided docker file to create the docker image (myimage:0.1) then by running the following command:

podman run --hooks-dir /etc/containers/hooks.d/ --rm -v$(pwd):/work -w/work localhost/myimage:0.1 python tools/demo.py configs/elephant/cityperson/cascade_hrnet.py chkpoint/epoch_5.pth.stu demo/ result_demo/

I got this error:

['demo/1.png', 'demo/2.png', 'demo/3.png']
unexpected key in source state_dict: mask_head.0.conv_res.conv.weight, mask_head.0.conv_res.conv.bias, mask_head.1.conv_res.conv.weight, mask_head.1.conv_res.conv.bias, mask_head.2.conv_res.conv.weight, mask_head.2.conv_res.conv.bias

[                              ] 0/3, elapsed: 0s, ETA:/pedestron/mmdet/apis/inference.py:39: UserWarning: Class names are not saved in the checkpoint's meta data, use COCO classes by default.
  warnings.warn('Class names are not saved in the checkpoint\'s '
Traceback (most recent call last):
  File "tools/demo.py", line 69, in <module>
  File "tools/demo.py", line 65, in run_detector_on_dataset
    detections = mock_detector(model, im, output_dir)
  File "tools/demo.py", line 37, in mock_detector
    results = inference_detector(model, image)
  File "/pedestron/mmdet/apis/inference.py", line 66, in inference_detector
    return _inference_single(model, imgs, img_transform, device)
  File "/pedestron/mmdet/apis/inference.py", line 93, in _inference_single
    result = model(return_loss=False, rescale=True, **data)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/pedestron/mmdet/core/fp16/decorators.py", line 49, in new_func
    return old_func(*args, **kwargs)
  File "/pedestron/mmdet/models/detectors/base.py", line 88, in forward
    return self.forward_test(img, img_meta, **kwargs)
  File "/pedestron/mmdet/models/detectors/base.py", line 79, in forward_test
    return self.simple_test(imgs[0], img_metas[0], **kwargs)
  File "/pedestron/mmdet/models/detectors/cascade_rcnn.py", line 241, in simple_test
    x = self.extract_feat(img)
  File "/pedestron/mmdet/models/detectors/cascade_rcnn.py", line 115, in extract_feat
    x = self.backbone(img)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/pedestron/mmdet/models/backbones/hrnet.py", line 446, in forward
    x = self.relu(x)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/activation.py", line 94, in forward
    return F.relu(input, inplace=self.inplace)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/functional.py", line 912, in relu
    result = torch.relu_(input)
RuntimeError: CUDA error: no kernel image is available for execution on the device

I really appreciate your help to execute the demo correctly.

hasanirtiza commented 1 year ago

I have personally never used docker image for Pedestron. So I am not 100% sure on how to answer. However, from the error it seems that you do not have correct cuda version. Can you confirm your cuda version and PyTorch version ? Secondly is it possible for you to run without the docker first (conda environment etc.,)?

Nazila-H commented 1 year ago

Thank you for your comment. I am working on the university server and do not have access as an admin to uninstall CUDA v11.7 to CUDA v10.0, because of that I tried to use the Doker file. On Doker file:

ARG CUDA="10.1"

Do you have any suggestions for compatible versions that I can apply on the Docker file? Then if it works we can also modify the Doker file on the repository as well.

hasanirtiza commented 1 year ago

You are in a tough spot Nazila. As for the compatibility version, you can read about CUDA and PyTorch etc version that we did try here. Now if I were you, I would look for issues regarding docker in mmdetection original repo, in particular older issues (around 2020-2021 ish).

Nazila-H commented 1 year ago

Thank you for your suggestion and sharing the link with me, I will do that.