KeyError: 'Non-existent config key: MODEL.BACKBONE.NAME2'

amiltonwong commented 1 year ago

Hi, @ZrrSkywalker @yangyangyang127 ,

Thanks a lot for releasing the V2 package. I've tried running zeroshot_cls. However, when I run sh zeroshot_cls.sh, I got the following KeyError.

/home/milton/anaconda3/envs/pointclip_new/lib/python3.8/site-packages/scipy/__init__.py:138: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.23.5)
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion} is required for this version of "
/data/code13/PointCLIP_V2/zeroshot_cls/clip/clip.py:23: UserWarning: PyTorch version 1.7.1 or higher is recommended
  warnings.warn("PyTorch version 1.7.1 or higher is recommended")
Traceback (most recent call last):
  File "main.py", line 131, in <module>
    main(args)
  File "main.py", line 82, in main
    cfg = setup_cfg(args)
  File "main.py", line 71, in setup_cfg
    cfg.merge_from_file(args.config_file)
  File "/home/milton/anaconda3/envs/pointclip_new/lib/python3.8/site-packages/yacs/config.py", line 213, in merge_from_file
    self.merge_from_other_cfg(cfg)
  File "/home/milton/anaconda3/envs/pointclip_new/lib/python3.8/site-packages/yacs/config.py", line 217, in merge_from_other_cfg
    _merge_a_into_b(cfg_other, self, self, [])
  File "/home/milton/anaconda3/envs/pointclip_new/lib/python3.8/site-packages/yacs/config.py", line 478, in _merge_a_into_b
    _merge_a_into_b(v, b[k], root, key_list + [k])
  File "/home/milton/anaconda3/envs/pointclip_new/lib/python3.8/site-packages/yacs/config.py", line 478, in _merge_a_into_b
    _merge_a_into_b(v, b[k], root, key_list + [k])
  File "/home/milton/anaconda3/envs/pointclip_new/lib/python3.8/site-packages/yacs/config.py", line 491, in _merge_a_into_b
    raise KeyError("Non-existent config key: {}".format(full_key))
KeyError: 'Non-existent config key: MODEL.BACKBONE.NAME2'
(pointclip_new) milton@milton-ws3:/data/code13/PointCLIP_V2/zeroshot_cls

It seems there's some issue in the config file. Could you give some hints to fix this issue?

Thanks~

yangyangyang127 commented 1 year ago

I think this problem happens when you use other Dassl package. We made some changes to the original Dassl lib.

To solve this, please conduct:

cd PointCLIP_V2/zeroshot_cls/Dassl3D
python setup.py develop
cd ..

bash sh zeroshot_cls.sh

I think this may help.

amiltonwong commented 1 year ago

@yangyangyang127 , thanks a lot for your reply. After compiling Dassl3D as listed above, this issue is passed. However, another issue occurs: RuntimeError: No CUDA GPUs are available

(pointclip_new) milton@milton-ws3:/data/code13/PointCLIP_V2/zeroshot_cls$ sh zeroshot_cls.sh
/home/milton/anaconda3/envs/pointclip_new/lib/python3.8/site-packages/scipy/__init__.py:138: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.23.5)
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion} is required for this version of "
/data/code13/PointCLIP_V2/zeroshot_cls/clip/clip.py:23: UserWarning: PyTorch version 1.7.1 or higher is recommended
  warnings.warn("PyTorch version 1.7.1 or higher is recommended")
Setting fixed seed: 2
Collecting env info ...
** System info **
PyTorch version: 1.10.1
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.2 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.31

Python version: 3.8.13 (default, Mar 28 2022, 11:38:47)  [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.8.0-43-generic-x86_64-with-glibc2.17
Is CUDA available: False
CUDA runtime version: 11.3.58
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3080 Ti
Nvidia driver version: 470.57.02
cuDNN version: Probably one of the following:
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn.so.8
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_adv_train.so.8
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_ops_train.so.8
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.24.2
[pip3] torch==1.10.1
[pip3] torch-cluster==1.6.0
[pip3] torch-scatter==2.0.9
[pip3] torch-sparse==0.6.13
[pip3] torchaudio==0.10.1
[pip3] torchvision==0.11.2
[conda] blas                      1.0                         mkl  
[conda] cudatoolkit               11.3.1               h2bc3f7f_2  
[conda] ffmpeg                    4.3                  hf484d3e_0    pytorch
[conda] mkl                       2021.4.0           h8d4b97c_729    conda-forge
[conda] mkl-service               2.4.0            py38h95df7f1_0    conda-forge
[conda] mkl_fft                   1.3.1            py38h8666266_1    conda-forge
[conda] mkl_random                1.2.2            py38h1abd341_0    conda-forge
[conda] numpy                     1.24.2                   pypi_0    pypi
[conda] numpy-base                1.23.5           py38h31eccc5_0  
[conda] pytorch                   1.10.1          py3.8_cuda11.3_cudnn8.2.0_0    pytorch
[conda] pytorch-cluster           1.6.0           py38_torch_1.10.0_cu113    pyg
[conda] pytorch-mutex             1.0                        cuda    pytorch
[conda] pytorch-scatter           2.0.9           py38_torch_1.10.0_cu113    pyg
[conda] pytorch-sparse            0.6.13          py38_torch_1.10.0_cu113    pyg
[conda] torchaudio                0.10.1               py38_cu113    pytorch
[conda] torchvision               0.11.2               py38_cu113    pytorch
        Pillow (9.4.0)

Loading trainer: PointCLIPV2_ZS
Loading dataset: ModelNet40
/home/milton/anaconda3/envs/pointclip_new/lib/python3.8/site-packages/torch/utils/data/dataloader.py:478: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 4, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  warnings.warn(_create_warning_msg(
***** Dataset statistics *****
  Dataset: ModelNet40
  # classes: 40
  # train_x: 9,840
  # val: 2,468
  # test: 2,468
Loading CLIP (backbone: ViT-B/16)
100%|███████████████████████████████████████| 351M/351M [00:30<00:00, 11.6MiB/s]
Traceback (most recent call last):
  File "main.py", line 131, in <module>
    main(args)
  File "main.py", line 97, in main
    trainer = build_trainer(cfg)
  File "/data/code13/PointCLIP_V2/zeroshot_cls/Dassl3D/dassl/engine/build.py", line 11, in build_trainer
    return TRAINER_REGISTRY.get(cfg.TRAINER.NAME)(cfg)
  File "/data/code13/PointCLIP_V2/zeroshot_cls/Dassl3D/dassl/engine/trainer.py", line 280, in __init__
    self.build_model()
  File "/data/code13/PointCLIP_V2/zeroshot_cls/trainers/zeroshot.py", line 49, in build_model
    clip_model.cuda()
  File "/home/milton/anaconda3/envs/pointclip_new/lib/python3.8/site-packages/torch/nn/modules/module.py", line 680, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "/home/milton/anaconda3/envs/pointclip_new/lib/python3.8/site-packages/torch/nn/modules/module.py", line 570, in _apply
    module._apply(fn)
  File "/home/milton/anaconda3/envs/pointclip_new/lib/python3.8/site-packages/torch/nn/modules/module.py", line 570, in _apply
    module._apply(fn)
  File "/home/milton/anaconda3/envs/pointclip_new/lib/python3.8/site-packages/torch/nn/modules/module.py", line 593, in _apply
    param_applied = fn(param)
  File "/home/milton/anaconda3/envs/pointclip_new/lib/python3.8/site-packages/torch/nn/modules/module.py", line 680, in <lambda>
    return self._apply(lambda t: t.cuda(device))
  File "/home/milton/anaconda3/envs/pointclip_new/lib/python3.8/site-packages/torch/cuda/__init__.py", line 214, in _lazy_init
    torch._C._cuda_init()
RuntimeError: No CUDA GPUs are available

I've checked my GPU using torch.cuda.is_available(), and it returns True. My system has a single RTX3080TI GPU, with CUDA 11.3 installed. Is the reason that the package requires multiple GPU training? Any hints to solve this issue?

Thanks~

amiltonwong commented 1 year ago

@yangyangyang127 , I found the reason. I've change export CUDA_VISIBLE_DEVICES=0 (in zeroshot_cls.sh) to adapt for my system environment. Now it works. Thanks~

yangyangyang127 / PointCLIP_V2

KeyError: 'Non-existent config key: MODEL.BACKBONE.NAME2' #3