Error while running python prepare_data_trainval.py

stanny880913 commented 1 year ago

when i tried to run python prepare_data_trainval.py,ut's will show this error

Traceback (most recent call last):
  File "prepare_data_trainval.py", line 13, in <module>
    from mmdet3d.core.bbox.box_np_ops import points_cam2img
  File "/ws/radiant/mmdetection3d/mmdet3d/__init__.py", line 23, in <module>
    f'MMCV=={mmcv.__version__} is used but incompatible. ' \
AssertionError: MMCV==1.3.12 is used but incompatible. Please install mmcv>=2.0.0rc4, <2.1.0.

but when i tried to install mmcv=2.0.0rc4,pytorch version and other pkg will not working,it will show

$ pip install mmcv==2.0.0rc4 -f https://download.openmmlab.com/mmcv/dist/cu101/torch1.7/index.
Looking in links: https://download.openmmlab.com/mmcv/dist/cu101/torch1.7/index.html
ERROR: Could not find a version that satisfies the requirement mmcv==2.0.0rc4

but there is no mmcv==2.0.0rc4 can choose it! when i use mmcv==2.0.0rc3,its will show

Traceback (most recent call last):
  File "prepare_data_trainval.py", line 13, in <module>
    from mmdet3d.core.bbox.box_np_ops import points_cam2img
  File "/ws/radiant/mmdetection3d/mmdet3d/__init__.py", line 3, in <module>
    import mmdet
  File "/root/miniconda/envs/test/lib/python3.6/site-packages/mmdet/__init__.py", line 25, in <module>
    f'MMCV=={mmcv.__version__} is used but incompatible. ' \
AssertionError: MMCV==2.0.0rc3 is used but incompatible. Please install mmcv>=1.3.8, <=1.4.0.

what can i do to fix this error?THX

longyunf commented 1 year ago

You may check the compatibility among versions of mmdet, mmdet3d, mmcv-full, pytorch and numpy. Note that mmcv-full is different from mmcv. I only tested the versions listed in requirements.txt and you may refer to https://mmdetection3d.readthedocs.io/en/latest/get_started.html for the compatibility of installing other versions.

stanny880913 commented 1 year ago

You may check the compatibility among versions of mmdet, mmdet3d, mmcv-full, pytorch and numpy. Note that mmcv-full is different from mmcv. I only tested the versions listed in requirements.txt and you may refer to https://mmdetection3d.readthedocs.io/en/latest/get_started.html for the compatibility of installing other versions.

Thx,may i ask which pytorch & cuda version you have? I use torch1.8&cuda10.2 to install mmcx-full1.8.12,its success,but when I pip install mmdet3d==0.16.0 , it show error like this

 File "/root/miniconda/envs/radiant_new/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1683, in _run_ninja_build
          raise RuntimeError(message) from e
      RuntimeError: Error compiling objects for extension
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> mmdet3d

note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.

later i use the guides you gave,i got clone the repo of mmdet3d,but i can only install mmdet3d 1.1.0,here is my env:

mmcv-full                 1.3.12                   pypi_0    pypi
mmdet                     2.14.0                   pypi_0    pypi
mmdet3d                   1.1.0                     dev_0    <develop>
mmengine                  0.7.3                    pypi_0    pypi
mmsegmentation            0.14.1                   pypi_0    pypi

when i run python prepare_data_trainval.py ,it raise the same error again,but im sure installed mmcv-full not mmcv AssertionError: MMCV==1.3.12 is used but incompatible. Please install mmcv>=2.0.0rc4, <2.1.0. How to fix it!!!! can you provide a create env guides to help us!!thx

longyunf commented 1 year ago

It seems that the version of mmdet3d is too high to be compatible with others. This is an example for CUDA 11.1 for your reference

conda create -n env1 python=3.7 -y conda activate env1 conda install pytorch==1.9.0 torchvision==0.10.0 cudatoolkit=11.1 -c pytorch -c conda-forge pip install mmcv-full==1.3.12 -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.9.0/index.html pip install mmdet==2.14.0 pip install mmsegmentation==0.14.1 pip uninstall numpy -y pip install numpy==1.19.5 pip install mmpycocotools pip install pycocotools==2.0.1 git clone https://github.com/open-mmlab/mmdetection3d.git cd mmdetection3d git checkout v0.16.0 pip install -v -e .

longyunf commented 1 year ago

I have updated train_radiant_pgd.py. You may add the argument --train_mini to fast verify the code on a mini training set.

From: StannyHo @.> Sent: Wednesday, May 3, 2023 11:48 PM To: longyunf/radiant @.> Cc: Long, Yunfei @.>; Comment @.> Subject: Re: [longyunf/radiant] Error while running python prepare_data_trainval.py (Issue #2)

It seems that the version of mmdet3d is too high to be compatible with others. This is an example for CUDA 11.1 for your reference

conda create -n env1 python=3.7 -y conda activate env1 conda install pytorch==1.9.0 torchvision==0.10.0 cudatoolkit=11.1 -c pytorch -c conda-forge pip install mmcv-full==1.3.12 -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.9.0/index.html https://urldefense.com/v3/__https://download.openmmlab.com/mmcv/dist/cu111/torch1.9.0/index.html__;!!HXCxUKc!0l_lMlM2Oc9hk2pAc1xd7tYw7g3PxG4zZ2MvppkpELWNOaeSrTxo4o8EzVVE2nBdbQjpnhHDwthHuMzvuW82RvY3$ pip install mmdet==2.14.0 pip install mmsegmentation==0.14.1 pip uninstall numpy -y pip install numpy==1.19.5 pip install mmpycocotools pip install pycocotools==2.0.1 git clone https://github.com/open-mmlab/mmdetection3d.git https://urldefense.com/v3/__https://github.com/open-mmlab/mmdetection3d.git__;!!HXCxUKc!0l_lMlM2Oc9hk2pAc1xd7tYw7g3PxG4zZ2MvppkpELWNOaeSrTxo4o8EzVVE2nBdbQjpnhHDwthHuMzvuSUbqmIp$ cd mmdetection3d git checkout v0.16.0 pip install -v -e .

Thx，finally running，but when i try to run ``` CUDA_VISIBLE_DEVICES=0,1,2,3 python scripts/train_radiant_pgd.py --resume --n│2674362 stannyho 20 0 50.5G 252M 102M S 4.0 0.4 1:58.95 /usr/share/code/code --type=r um_gpus 4 --samples_per_gpu 4 --epochs 10 --lr 0.001 --workers_per_gpu 2

when the first epoch finished，it raise ZeroDivisionError: division by zero , have you met this error before?

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/longyunf/radiant/issues/2*issuecomment-1534059695__;Iw!!HXCxUKc!0l_lMlM2Oc9hk2pAc1xd7tYw7g3PxG4zZ2MvppkpELWNOaeSrTxo4o8EzVVE2nBdbQjpnhHDwthHuMzvuZsJIh9n$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ANL6L6EWSKETGUSJKKODTPDXEMRJHANCNFSM6AAAAAAXR4ZDR4__;!!HXCxUKc!0l_lMlM2Oc9hk2pAc1xd7tYw7g3PxG4zZ2MvppkpELWNOaeSrTxo4o8EzVVE2nBdbQjpnhHDwthHuMzvuSrBxACy$. You are receiving this because you commented.Message ID: @.***>

stanny880913 commented 1 year ago

I have updated train_radiant_pgd.py. You may add the argument --train_mini to fast verify the code on a mini training set. … ____ From: StannyHo @.> Sent: Wednesday, May 3, 2023 11:48 PM To: longyunf/radiant @.> Cc: Long, Yunfei @.>; Comment @.> Subject: Re: [longyunf/radiant] Error while running python prepare_data_trainval.py (Issue #2) It seems that the version of mmdet3d is too high to be compatible with others. This is an example for CUDA 11.1 for your reference conda create -n env1 python=3.7 -y conda activate env1 conda install pytorch==1.9.0 torchvision==0.10.0 cudatoolkit=11.1 -c pytorch -c conda-forge pip install mmcv-full==1.3.12 -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.9.0/index.html<https://urldefense.com/v3/https://download.openmmlab.com/mmcv/dist/cu111/torch1.9.0/index.html;!!HXCxUKc!0l_lMlM2Oc9hk2pAc1xd7tYw7g3PxG4zZ2MvppkpELWNOaeSrTxo4o8EzVVE2nBdbQjpnhHDwthHuMzvuW82RvY3$> pip install mmdet==2.14.0 pip install mmsegmentation==0.14.1 pip uninstall numpy -y pip install numpy==1.19.5 pip install mmpycocotools pip install pycocotools==2.0.1 git clone https://github.com/open-mmlab/mmdetection3d.git<https://urldefense.com/v3/https://github.com/open-mmlab/mmdetection3d.git;!!HXCxUKc!0l_lMlM2Oc9hk2pAc1xd7tYw7g3PxG4zZ2MvppkpELWNOaeSrTxo4o8EzVVE2nBdbQjpnhHDwthHuMzvuSUbqmIp$> cd mmdetection3d git checkout v0.16.0 pip install -v -e . Thx，finally running，but when i try to run `` CUDA_VISIBLE_DEVICES=0,1,2,3 python scripts/train_radiant_pgd.py --resume --n│2674362 stannyho 20 0 50.5G 252M 102M S 4.0 0.4 1:58.95 /usr/share/code/code --type=r um_gpus 4 --samples_per_gpu 4 --epochs 10 --lr 0.001 --workers_per_gpu 2 when the first epoch finished，it raiseZeroDivisionError: division by zero` , have you met this error before? — Reply to this email directly, view it on GitHub<https://urldefense.com/v3/https://github.com/longyunf/radiant/issues/2*issuecomment-1534059695;Iw!!HXCxUKc!0l_lMlM2Oc9hk2pAc1xd7tYw7g3PxG4zZ2MvppkpELWNOaeSrTxo4o8EzVVE2nBdbQjpnhHDwthHuMzvuZsJIh9n$>, or unsubscribe<https://urldefense.com/v3/https://github.com/notifications/unsubscribe-auth/ANL6L6EWSKETGUSJKKODTPDXEMRJHANCNFSM6AAAAAAXR4ZDR4;!!HXCxUKc!0l_lMlM2Oc9hk2pAc1xd7tYw7g3PxG4zZ2MvppkpELWNOaeSrTxo4o8EzVVE2nBdbQjpnhHDwthHuMzvuSrBxACy$>. You are receiving this because you commented.Message ID: @.***>

I change the code in train_radiant_pgd.py.

  args.train_ann_file = None
    # args.val_ann_file = join(args.dir_data, 'fusion_data', 'nus_infos_val_mini.coco.json')
    args.val_ann_file = join(
        args.dir_data, 'fusion_data', 'nus_infos_val.coco.json')
    # args.test_ann_file = join(args.dir_data, 'fusion_data', 'nus_infos_test_mini.coco.json')
    args.test_ann_file = join(
        args.dir_data, 'fusion_data', 'nus_infos_test.coco.json')

and the len(test_loader) won'y be 0, is this correct, i want to use the full dataset to train,not mini!!!! or maybe you mean just need to use mini to validation? but i didn't download mini dataset

longyunf commented 1 year ago

The changes are correct.

You do not need to download nuScenes mini data. The mini data are generated by prepare_data_trainval.py. You can find nus_infos_train_mini.coco.json and nus_infos_val_mini.coco.json in data/nuscenes/fusion_data/.

You can use --train_mini for fast sanity check on mini training data and remove the argument when training with full dataset. The updated code uses mini val data for validation by default. Set argument --val_mini to false if you want to run validation on full val set.

stanny880913 commented 1 year ago

The changes are correct.

You do not need to download nuScenes mini data. The mini data are generated by prepare_data_trainval.py. You can find nus_infos_train_mini.coco.json and nus_infos_val_mini.coco.json in data/nuscenes/fusion_data/.

You can use --train_mini for fast sanity check on mini training data and remove the argument when training with full dataset. The updated code uses mini val data for validation by default. Set argument --val_mini to false if you want to run validation on full val set.

Thx, im going to use mini val dataset to do quick test and use full dataset to train it~~ May i ask about what different between Train radar branch and Train DWN?

longyunf commented 1 year ago

You can check Fig. 2 in the paper. Radar branch is shown as blue boxes and Depth Weight Net (DWN) is a model for depth fusion shown on the right side.

stanny880913 commented 1 year ago

You can check Fig. 2 in the paper. Radar branch is shown as blue boxes and Depth Weight Net (DWN) is a model for depth fusion shown on the right side.

Understand,I also want to ask how can I use the pretrained model to predict my own dataset, not nuscenes, result is a visualization of test data? is this posible? Thank you very much~~

And sorry for asking another question,when i run CUDA_VISIBLE_DEVICES=0 python scripts/train_radiant_pgd.py --do_eval --eval_set test its raise

Traceback (most recent call last):
  File "scripts/train_radiant_pgd.py", line 432, in <module>
    main(args)
  File "scripts/train_radiant_pgd.py", line 385, in main
    model_mlp.load_state_dict(filter_state_dict_keys(checkpoint_mlp['state_dict']))
  File "/home/stannyho/anaconda3/envs/radiant/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1407, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for FusionMLP:
    Missing key(s) in state_dict: "fc1.weight", "fc1.bias", "fc2.weight", "fc2.bias", "fc3.weight", "fc3.bias", "fc4.weight", "fc4.bias". 
    Unexpected key(s) in state_dict: "backbone_img.conv1.weight", "backbone_img.bn1.weight", "backbone_img.bn1.bias", "backbone_img.bn

How can I fix it !!!!!

longyunf commented 1 year ago

Please make sure that there are two weight files available to run the evaluation: (1) weights for radar/camera branch (data/nuscenes/fusion_data/train_result/radiant_pgd/checkpoint.tar) and (2) DWN (data/nuscenes/fusion_data/dwn_radiant_pgd/train_result/checkpoint.tar). The error shows that the weight file of DWN is not correct.

To apply it to your own data, you need to convert your data to the same input format (refer to lib/fusion_dataset) and refer to the test code on how to predict detections (see the function SingleStageMono3DDetector.simple_test in radiant_pgd_network.py).

stanny880913 commented 1 year ago

Please make sure that there are two weight files available to run the evaluation: (1) weights for radar/camera branch (data/nuscenes/fusion_data/train_result/radiant_pgd/checkpoint.tar) and (2) DWN (data/nuscenes/fusion_data/dwn_radiant_pgd/train_result/checkpoint.tar). The error shows that the weight file of DWN is not correct.

To apply it to your own data, you need to convert your data to the same input format (refer to lib/fusion_dataset) and refer to the test code on how to predict detections (see the function SingleStageMono3DDetector.simple_test in radiant_pgd_network.py).

Now i have a folder with pair .jpg and .xyz or .pcd file,the classes that i want can same with nuscenes,I only want to do predict and draw bbox in my image, I thought your input is coco format? so if i create a owndataset.json and convert my image to coco fomat and set to test dataset,then run python scripts/train_radiant_pgd.py --do_eval --eval_set test, is this correct? thank you~~

Could you write a version like throwing own dataset,I have no idea🥲🥲, include image(jpg) and radar(pcd or xyz)to do visualize predict result, it’s my helps lots of people!

longyunf commented 1 year ago

Check the format of data (i.e. single frame input), convert your data to the same format of the data and then run the trained model in evaluation mode
result = model(model_mlp=model_mlp, return_loss=False, rescale=True, **data)
to obtain detections for a single frame.
(Related code: https://github.com/longyunf/radiant/blob/cf5355396d42ef17940e29ef8f9e3cabfd8035c3/scripts/train_radiant_pgd.py#L120)

Check the following code for how to prepare input data from raw images and radar data. https://github.com/longyunf/radiant/blob/cf5355396d42ef17940e29ef8f9e3cabfd8035c3/lib/fusion_dataset.py#L188-L213

stanny880913 commented 1 year ago

So I don’t need to convert to coco format or prepare ann_file? the format of the radar data is .pcd file? Thank you~ Because my nuscenes imcludes info like tokens....,but i inly have .jpg and .pcd,your coco.json includes lot of info that i don't have it : (

longyunf commented 1 year ago

See the format of the variable data in line 120. It is a dictionary with 4 keys. data.keys() ['img_metas', 'img', 'radar_map', 'radar_pts']

No token is required. To use radar data, you may transform them to similar format of radar_map and radar_pts.

stanny880913 commented 1 year ago

See the format of the variable data in line 120. It is a dictionary with 4 keys. data.keys() ['img_metas', 'img', 'radar_map', 'radar_pts']

No token is required. To use radar data, you may transform them to similar format of radar_map and radar_pts.

Thx for your help!!I also want to ask if I finished this dataset transformation,can i use my own dataset to do training?

longyunf commented 1 year ago

For training, you also need to create lables in addition to inputs.

stanny880913 commented 1 year ago

For training, you also need to create lables in addition to inputs.

Ok,thank you, Sorry for onr more question,if i want to use my dataset to predict, do i need to convert to json? like nus_infos_test.coco.json

longyunf commented 1 year ago

In fusion_dataset.py, the class NuScenesFusionDataset uses load_annotation function to read labels from json files, which is created by prepare_data_XXX.py. Of course you can use other formats as long as you modify the corresponding load_annotation function.

longyunf / radiant

Error while running python prepare_data_trainval.py #2