mit-han-lab / bevfusion

[ICRA'23] BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation
https://bevfusion.mit.edu
Apache License 2.0
2.35k stars 423 forks source link

LiDAR Points Format for Custom Dataset #382

Closed GerhardArya closed 1 year ago

GerhardArya commented 1 year ago

Thanks a lot for publishing the code for your great work!

I'm currently working on trying to get BEVFusion to run with a custom dataset.

I know that the nuscenes LiDAR points are in the format of (x, y, z, intensity, ring_index). But it seems like BEVFusion eventually replaces ring_index with timestamp somewhere down the road. Where does this happen?

The custom dataset I want to use unfortunately only has (x, y, z, intensity) for the LiDAR points. My question is, is timestamp value crucial for BEVFusion to work properly?

I'm not sure, if I understood the code correctly so far but it seems like it is not used anywere. Would it be okay if I omit it and set load_dims as 4 or maybe just fill it with zeroes in the converter for my custom dataset?

Thanks in advance for the help!

liuliuliu11 commented 1 year ago

Hello, the official address for downloading the pre-training model of the project is invalid. If you have swint-nuimages-pretrained.pth, could you share it with my email address [balms123456@gmail.com], Wish you all the best in scientific research.

VeeranjaneyuluToka commented 1 year ago

@GerhardArya , i am also trying with my custom PCLs and used load_dims as 4 but i am ending up with the below error at the end of the first epoch, plz refer to the below log

File "tools/train.py", line 90, in main train_model( File "/home/hykeserver/Downloads/git_repos/object_detection_3d/bevfusion/mmdet3d/apis/train.py", line 135, in train_model runner.run(data_loaders, [("train", 1)]) File "/home/hykeserver/anaconda3/envs/bevf_ptt19/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 138, in run epoch_runner(data_loaders[i], kwargs) File "/home/hykeserver/Downloads/git_repos/object_detection_3d/bevfusion/mmdet3d/runner/epoch_based_runner.py", line 17, in train super().train(data_loader, kwargs) File "/home/hykeserver/anaconda3/envs/bevf_ptt19/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 60, in train self.call_hook('after_train_epoch') File "/home/hykeserver/anaconda3/envs/bevf_ptt19/lib/python3.8/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook getattr(hook, fn_name)(self) File "/home/hykeserver/anaconda3/envs/bevf_ptt19/lib/python3.8/site-packages/mmcv/runner/hooks/evaluation.py", line 267, in after_train_epoch self._do_evaluate(runner) File "/home/hykeserver/anaconda3/envs/bevf_ptt19/lib/python3.8/site-packages/mmdet/core/evaluation/eval_hooks.py", line 115, in _do_evaluate results = multi_gpu_test( File "/home/hykeserver/anaconda3/envs/bevf_ptt19/lib/python3.8/site-packages/mmdet/apis/test.py", line 98, in multi_gpu_test result = model(return_loss=False, rescale=True, data) File "/home/hykeserver/anaconda3/envs/bevf_ptt19/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, *kwargs) File "/home/hykeserver/anaconda3/envs/bevf_ptt19/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 886, in forward output = self.module(inputs[0], kwargs[0]) File "/home/hykeserver/anaconda3/envs/bevf_ptt19/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, kwargs) File "/home/hykeserver/anaconda3/envs/bevf_ptt19/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 128, in new_func output = old_func(*new_args, *new_kwargs) File "/home/hykeserver/Downloads/git_repos/object_detection_3d/bevfusion/mmdet3d/models/fusion_models/bevfusion.py", line 188, in forward outputs = self.forward_single( File "/home/hykeserver/anaconda3/envs/bevf_ptt19/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 128, in new_func output = old_func(new_args, new_kwargs) File "/home/hykeserver/Downloads/git_repos/object_detection_3d/bevfusion/mmdet3d/models/fusion_models/bevfusion.py", line 243, in forward_single feature = self.extract_lidar_features(points) File "/home/hykeserver/Downloads/git_repos/object_detection_3d/bevfusion/mmdet3d/models/fusion_models/bevfusion.py", line 130, in extract_lidar_features feats, coords, sizes = self.voxelize(x) File "/home/hykeserver/anaconda3/envs/bevf_ptt19/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context return func(*args, kwargs) File "/home/hykeserver/anaconda3/envs/bevf_ptt19/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 214, in new_func output = old_func(*new_args, *new_kwargs) File "/home/hykeserver/Downloads/git_repos/object_detection_3d/bevfusion/mmdet3d/models/fusion_models/bevfusion.py", line 142, in voxelize ret = self.encoders["lidar"]"voxelize" File "/home/hykeserver/anaconda3/envs/bevf_ptt19/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(input, kwargs) File "/home/hykeserver/Downloads/git_repos/object_detection_3d/bevfusion/mmdet3d/ops/voxel/voxelize.py", line 131, in forward return voxelization( File "/home/hykeserver/Downloads/git_repos/object_detection_3d/bevfusion/mmdet3d/ops/voxel/voxelize.py", line 55, in forward voxel_num = hard_voxelize( RuntimeError: CUDA error: invalid configuration argument CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted.


mpirun detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was:

Process name: [[9838,1],0] Exit code: 1

Let me know if you come across this error and if you have any workaround.

@kentang-mit , any comments on the above error?

GerhardArya commented 1 year ago

@VeeranjaneyuluToka I haven't run into that error since I'm currently still trying to create the custom create_data, converter, dataset, and create_gt_database needed to get my dataset running with BEVFusion.

I haven't had the chance to try training with my custom dataset yet since my dataset doesn't have stamps and so on, so I can't use nuscenes evaluation like BEVFusion. So I have to try reverse engineer nuscenes evaluation and customize it to my needs since I'm trying to make this integration as close to BEVFusion as possible.

I'm taking a slightly different approach to you so far. My dataset doesn't have sweeps on top of sample. So, instead of removing the 5th dimension, which is timestamp relative to the sample (when it is a sweep), I treat each frame I have as a sample and just set 0 as the 5th dimension, which is how BEVFusion seems to treat that dimension for samples. I'm not sure if this understanding is correct, so any input from the authors would be highly appreciated.

But, I'll keep an eye out for this error if I get it as well.

GerhardArya commented 1 year ago

@VeeranjaneyuluToka Just an update for you, I have finished the first versions of all the code I was working on now and I'm currently training the LiDAR bbox detector part.

To do this, I'm basically reusing the nuscenes voxelnet configs with some changes to fit my dataset and the data it contains. I have trained 9 epochs so far at the time of writing and I don't seem to run into your issues. Metrics seem to be developing sort of okay so far.

I will keep an eye open for the issue, if (but hopefully not) it appears in the future.

Based on what I could get from your stack trace, it seems like configuration issue? Maybe you could check if load_dims is properly changed everywhere or just do what I did and use load_dims=5 and fill the 5th dimension (timestamp) properly. For sample frame it is always 0. For sweeps, calculate it as: (sweep_ts / 1e6) - (sample_ts / 1e6), which is how BEVFusion seems to be doing it.

kentang-mit commented 1 year ago

Sorry folks, I was busy working on other projects recently and I finally got a chance to process bevfusion issues today. Actually the number of feature dimensions does not quite matter for the whole codebase, and it is totally fine to change the load_dims from 5 to 4 and adjust the input channels to the model accordingly. Timestamps, however, could be very helpful for the final performance. The important thing is that you want to assign different numbers for points from different time. The specific formulation of these numbers may not be that important. You can even use relative frame index to achieve similar performance.

Best, Haotian

GerhardArya commented 1 year ago

@kentang-mit Thanks for the reply and no worries! :smile: One question I still have is related to yaw.

If my dataset has data and labels in LiDAR coordinate system, the LiDAR is mounted with x axis already pointing forward, and I'm using a copied and customized version of nuscenes evaluation protocol that evaluates in LiDAR coordinate system (AFAIK nuscenes evaluates in global coordinate system), is it actually okay for me to ignore the -yaw - 1/2 pi radian that is usually applied to yaw angles?

I'm just wondering since I have successfully trained, evaluated, and visualized the results of a voxelnet0p075 model trained on LiDAR only while removing this operation to the yaw angle. The results are okay (0.56-0.57 mAP on test set, I might need to tune the score threshold for transfusion since I seem to have some false negatives on 0.1 but also a train of empty false positives with the default 0.0).

But, there seems to be some inaccuracies with orientation and doubled/tripled detection. The orientation is generally correct (but not perfect) but there are frames where it is wrong by almost 90 degrees and multiple detections of the same object (mostly with busses/trucks/trailers and far away vehicles). Do you have some suggestions on things that might help with improving orientation accuracy?

An example:

Example

GerhardArya commented 1 year ago

@kentang-mit Nevermind. I found the cause of the worst of these misalignments was that it seems like I still need to change my yaw angles in the ground truth in my dataset to -yaw. Once I did that, the issue is generally solved.

Although, saying that, the orientation is still wrong in some edge cases (near-blind spots or further away areas, areas with few points). Do you have any suggestions on what to do to improve orientation accuracy for these areas?

Another thing is that training camera + lidar seems to not boost mAP at all in my case. It seems to even reduce mAP a little bit.

LiDAR only: voxelnet0p075-scorethresh01-gw15all-newmat-gtfixed

LiDAR + Camera: bevfusion-a9-det-test-newmat-gtfixed

One thing to note from my dataset is that it is infrastructure data, taken from sensors mounted on a gantry and not from a car at ground level. Could this affect the pretrained camera backbone since it is already trained for vehicle at ground level (nuImages), causing this weird result of LiDAR vs Camera + LiDAR?

Also it has 1920 training frames, 240 validation, and 240 testing. I don't know if this is too small to get a result comparable to your results.

Also, I would like to maybe change the backbones in the future (Pointpillars for LiDAR and maybe YOLOv8's CSP_Darknet for images). Are there anything that I would need to particularly look out for when trying to do this?

VeeranjaneyuluToka commented 1 year ago

@GerhardArya , i still have same issue with 4D point clouds, would you mind sharing your config fields changes here? And also do you try to train camera alone model?

GerhardArya commented 1 year ago

@VeeranjaneyuluToka Yes, I did try to train camera only model but the results were absolutely horrible. 0 mAP overall and for every class after 20 epochs. Not sure what is going on with camera only. Have you tried camera only before? Could you check if my camera only configs were correc?

Configs for camera + LiDAR: configs_cam_lidar.txt

Configs for camera only: configs_cam_only.txt

Here are the visualizations of camera and LiDAR feature maps from the camera + LiDAR training:

Camera: camera features

LiDAR: lidar features

Note: The visualization was created quickly so the grid is not representative of the actual data. 0,0 is directly in the middle in my data and front for the LiDAR is down in that visualization.

It seems like the camera backbone can't extract features that make sense, while the LiDAR backbone did a good job. I tried visualizing from camera only training as well but the result is basically the same as the camera feature map above.

VeeranjaneyuluToka commented 1 year ago

@GerhardArya , Thanks for quick reply. I have been trying on camera only model and the bbox loss tensor shows 0 as shown here https://github.com/mit-han-lab/bevfusion/issues/371. Plz have a look and let me know if you have any suggestions.

And also tensorboard only displays LR and momentom, any config changes that i need to do, so that i will get train and val losses also in tensorboard?

And also i tried to use tools/visualize.py script and visualize my GT, it does not show any bbox on image. i primarily suspect on transformations. if you see implementation here https://github.com/mit-han-lab/bevfusion/blob/main/mmdet3d/datasets/nuscenes_dataset.py (line no 260) lidar2image is just this lidar2image = camera_intrinsics @ lidar2camera_rt.T, i think some where it has to bring both lidar and camera coordinates into the same coordinate system. Am not sure where that is happening? Discussing more on this topic in this issue https://github.com/mit-han-lab/bevfusion/issues/394

I will check your camera only config and get back to you. Thanks!

GerhardArya commented 1 year ago

@VeeranjaneyuluToka Hmmm unfortunately I don't quite know yet what might cause your issues. I will try training a camera only model again later today but right now I'm trying to make sure that my transformations are all correct and then retrying to do camera + lidar finetuning on my existing lidar only model.

Regarding visualization script, I think it uses only your lidar2image, which is a projection matrix. My projection matrix was already correct (I got it from my dataset) since I could display my GT correctly using that script. Right now I'm trying to make sure that my lidar2camera and other matrices matrix are correct, assuming my intrinsics are already correct.

Regarding tensorboard, unfortunately I haven't touched tensorboard so far. My feature map visualization was done by inserting a code in bevfusion.py that saves it. When I'm training I simply comment the lines calling that function out. If I ever decide to use tensorboard, I'll come back to you.

GerhardArya commented 1 year ago

Update: After trying to fix/get the correct transformation matrices, I managed to get the feature map to make a bit more sense and to look similar to the camera fovs from the dataset.

Camera: camera features fixed

LiDAR: lidar features fixed

The remaining issue now is the mAP. Camera + LiDAR mAP is still lower than LiDAR only at 0.5533 vs 0.5747. Any ideas on why this could be happening and what I could do to solve this?

VeeranjaneyuluToka commented 1 year ago

Good! Are you trying on modified NuScenes or your own dataset? what kind of transformations matrices corrections you made?

I am currently experimenting with completely new dataset which is generated in house with our own sensor setup. So i have to generate all new transformations that it needs.

A vague idea is that can we visualize BEV features of both modalities and check their alignments? My only gut feeling is that if there is right transformations and GTs, then it should work as expected.

GerhardArya commented 1 year ago

I'm trying my own (infrastructure POV) dataset.

In my case, I assumed that my projection matrix and camera intrinsics were already correct since I know that I already could visualize my GTs correctly using BEVFusion's visualization script. I then calculated my extrinsics (transformation matrix) from there and validated them by visualizing them using Open3D and checking their locations there. I then made sure to place them correctly to ensure that the values I inserted would be as close as possible to BEVFusion's nuscenes database.

I think the BEV features aligned pretty well and matches their locations IRL in my case based on the visualizations I made above. It's just that for whatever reason, camera seems to not contribute to a better mAP and actually makes mAP lower instead.

I also tried to train a camera only model on my data again and after changing the LR (original cyclic LR causes losses to go to NaN and breaks the training when it goes high again) to:

lr_config: min_lr_ratio: 0.0001 policy: CosineAnnealing warmup: linear warmup_iters: 500 warmup_ratio: 0.33333333

Which is copied from one of the transfusion model configs. I managed to get it to finish training and it ended training with 0.55ish mAP (which is horrible since LiDAR only or fusion can reach 0.88ish mAP during training on my data in my experience). But, when I tried to evaluate it on test set its mAP dropped to 0.07. When I visualized it, it showed a lot of false positives around objects it's supposed to detect.

Currently I'm trying to maybe change some configs for centerhead increasing score threshold from 0.1 to 0.3, increase min_lr_ratio ti 0.001, and then train again. But I'm not too optimistic. I'm utterly confused on what is going on with the camera module since it seems to perform horribly on my data.

Maybe the fact that it is pretrained on nuImages (vehicle perspective) and me trying to finetune that to try detect classes with different naming convention (nuImages: car vs. my dataset: CAR) and a different infrastructure perspective is the cause? I'm not sure.

Hopefully the author comes back soon and could shed some light on what could be happening...

VeeranjaneyuluToka commented 1 year ago

@GerhardArya , thanks for getting back here. I have a simple question, does BEVFusion assumes LiDAR is in FLU (Forward Left Up) coordinate system? I was thinking that it needs ENU ( East North Up) as nuscenes is in it. What coordinate system your LiDAR data that you are feeding to BEVFusion is? Thanks!

GerhardArya commented 1 year ago

@VeeranjaneyuluToka If I remembered correctly, it was mentioned somewhere that it needs FLU for LiDAR. Nuscenes is in ENU but if you noticed, yaw is processed using -yaw-pi/2 rad when the pickle files are generated, essentially transforming it from ENU to FLU. At least, if my understanding is correct. My data is already in FLU (as far as I know) but for whatever reason it needs to be preprocessed using -yaw first because otherwise the GT is wrong (mirrored), resulting in wrong predictions. Although, I don't think this -yaw transformation is related to my current issues with the camera backbone.

VeeranjaneyuluToka commented 1 year ago

@GerhardArya , is this the conversion that you are talking about in nuscenes_converter.py at line 276? image

GerhardArya commented 1 year ago

@VeeranjaneyuluToka If I remembered it correctly, that's one of them yes.

But, I also made my own version of the nuScenes evaluation protocol used in BEVFusion. I basically read nuScenes' evaluation code, copied, and then changed it to fit my needs. I needed to do this because my data doesn't have nuScenes' tokens, which are used for everything within nuScenes including evaluation.

There was also something similar to this calculation in that evaluation code and several other files, if I remembered it correctly. I simply searched in my IDE (Visual Studio Code) for "-rots - np.pi / 2" or even just "- np.pi / 2" and changed it to what I needed in places/files where it makes sense for my case.

VeeranjaneyuluToka commented 1 year ago

@GerhardArya , Ok! A quick question on lidar alone model

Voxelization returns zero sized tensor in case of validation data, it works fine for training.

feats shape: torch.Size([0, 4]) coords shape: torch.Size([0, 4]) sizes shape: torch.Size([0])

Any idea on this behavior?

GerhardArya commented 1 year ago

@VeeranjaneyuluToka Not really... Because in my case it works fine for training, validation, and test.

Did you use the nuscenes-dataset.py class or did you write your own dataset class python file? I made my own custom dataset class, with its own custom evaluation methods, my own converter script, etc. They are all based on their nuScenes equivalents that BEVFusion uses but I changed quite a bit.

For example, my whole system doesn't use tokens but uses timestamps instead. It doesn't use the custom nuscenes classes in the evaluation methods I lifted and modified from the nuScenes evaluation protocol but uses dicts instead. And a lot of other changes. But the structure of the info objects in the dataset pkl file generated by my custom converter script, how data is handled within the dataset class, etc. is similar to the nuscenes one so it still functions pretty similarly overall to BEVFusion's nuScenes.

One thing that might be happening in your case could be a bug somewhere in your dataset class when handling validation data or something like that. But I can't say for sure since I never had this problem.

VeeranjaneyuluToka commented 1 year ago

@GerhardArya , thanks for your reply again. My approach is same as yours which means i have my own class for my dataset by having timestamps. It is just that i am trying with a camera and a Lidar feeds.

I am trying to change point_cloud_range but ending up with the below error File "/home/hykeserver/anaconda3/envs/bevf_ptt19/lib/python3.8/site-packages/torch/cuda/amp/autocast_mode.py", line 33, in iterable = map(lambda v: _cast(v, dtype), value) File "/home/hykeserver/anaconda3/envs/bevf_ptt19/lib/python3.8/site-packages/torch/cuda/amp/autocast_mode.py", line 25, in _cast return value.to(dtype) if is_eligible else value RuntimeError: CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. [hykeserver:638991] Process received signal [hykeserver:638991] Signal: Aborted (6) [hykeserver:638991] Signal code: (-6) [hykeserver:638991] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f57bfdf2420] [hykeserver:638991] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f57bfc2f00b] [hykeserver:638991] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f57bfc0e859] [hykeserver:638991] [ 3] /home/hykeserver/anaconda3/envs/bevf_ptt19/lib/python3.8/site-packages/torch/lib/../../../../libstdc++.so.6(+0xb135a)[0x7f57b45ba35a] [hykeserver:638991] [ 4] /home/hykeserver/anaconda3/envs/bevf_ptt19/lib/python3.8/site-packages/torch/lib/../../../../libstdc++.so.6(+0xb03b9)[0x7f57b45b93b9] [hykeserver:638991] [ 5] /home/hykeserver/anaconda3/envs/bevf_ptt19/lib/python3.8/site-packages/torch/lib/../../../../libstdc++.so.6(__gxx_personality_v0+0x87)[0x7f57b45b9ae7] [hykeserver:638991] [ 6] /home/hykeserver/anaconda3/envs/bevf_ptt19/bin/../lib/libgcc_s.so.1(+0x111e4)[0x7f57bc93c1e4] [hykeserver:638991] [ 7] /home/hykeserver/anaconda3/envs/bevf_ptt19/bin/../lib/libgcc_s.so.1(_Unwind_Resume+0x12e)[0x7f57bc93cc1e] [hykeserver:638991] [ 8] /home/hykeserver/anaconda3/envs/bevf_ptt19/lib/python3.8/site-packages/torch/lib/libc10_cuda.so(+0x1d2fb)[0x7f57b1ade2fb] [hykeserver:638991] [ 9] /home/hykeserver/anaconda3/envs/bevf_ptt19/lib/python3.8/site-packages/torch/lib/libc10.so(_ZN3c1010TensorImpl17release_resourcesEv+0xa4)[0x7f57b1864314] [hykeserver:638991] [10] /home/hykeserver/anaconda3/envs/bevf_ptt19/lib/python3.8/site-packages/torch/lib/libtorch_python.so(+0x295359)[0x7f57517b1359] [hykeserver:638991] [11] /home/hykeserver/anaconda3/envs/bevf_ptt19/lib/python3.8/site-packages/torch/lib/libtorch_python.so(+0xadb231)[0x7f5751ff7231] [hykeserver:638991] [12] /home/hykeserver/anaconda3/envs/bevf_ptt19/lib/python3.8/site-packages/torch/lib/libtorch_python.so(_Z28THPVariable_subclass_deallocP7_object+0x292)[0x7f5751ff7532] [hykeserver:638991] [13] python(+0x11cb86)[0x560c5bc2ab86] [hykeserver:638991] [14] python(+0x11cf76)[0x560c5bc2af76] [hykeserver:638991] [15] python(+0x11cf76)[0x560c5bc2af76] [hykeserver:638991] [16] python(+0x1109a0)[0x560c5bc1e9a0] [hykeserver:638991] [17] python(+0x11d346)[0x560c5bc2b346] [hykeserver:638991] [18] python(+0x11d2fc)[0x560c5bc2b2fc] [hykeserver:638991] [19] python(+0x11d2fc)[0x560c5bc2b2fc] [hykeserver:638991] [20] python(+0x11d2fc)[0x560c5bc2b2fc] [hykeserver:638991] [21] python(+0x11d2fc)[0x560c5bc2b2fc] [hykeserver:638991] [22] python(+0x11d2fc)[0x560c5bc2b2fc] [hykeserver:638991] [23] python(+0x11d2fc)[0x560c5bc2b2fc] [hykeserver:638991] [24] python(+0x11d2fc)[0x560c5bc2b2fc] [hykeserver:638991] [25] python(PyDict_SetItem+0x2ac)[0x560c5bc7325c] [hykeserver:638991] [26] python(PyDict_SetItemString+0x4f)[0x560c5bc7394f] [hykeserver:638991] [27] python(PyImport_Cleanup+0x9b)[0x560c5bd5846b] [hykeserver:638991] [28] python(Py_FinalizeEx+0x83)[0x560c5bd58823] [hykeserver:638991] [29] python(Py_RunMain+0x110)[0x560c5bd5b780] [hykeserver:638991] End of error message Aborted (core dumped)

I noticed you also changed it, did you face this kind of error, is it because of invalid range?

GerhardArya commented 1 year ago

@VeeranjaneyuluToka I don't quite remember how I solved it because it happened a while back. But I did also get a similar error at some point. The issue for me was that I ran out of VRAM.

On another note, I solved the issue with my LiDAR only model not reaching the performance published on Github readme. My issue was that in my 10 classes, 1 class stayed 0 mAP no matter what I did because it had extremely few representation in the dataset. I removed the class and worked with just 9 and I managed to get within around 1 mAP (63.09 mAP) of the published nuScenes validation set LiDAR-only performance (64.68 mAP).

The remaining issue now is that using fusion only increased performance by around 0.17 mAP and not the 3.84 mAP from the table. Meaning while fusion no longer causes performance to decrease, it still doesn't really help either.

(CC: @kentang-mit )

VeeranjaneyuluToka commented 1 year ago

@GerhardArya , Ok! that is good to hear.

I have one more question, i noticed that you have changed the point cloud range, is not it?

did you notice some issues if you do not change it?

I think it is important to change point cloud range and voxel_size based on our dataset, is not it?

kentang-mit commented 1 year ago

Hi @GerhardArya,

Regarding result reproduction, there are several things you can try that I found out to be helpful.

First, the GT database generation logic in our public release does not match with my internal implementation. The problem lies in the origin here. Changing it back to [0.5, 0.5, 0.5] will give correct GT database. Otherwise the cropped point cloud within each box might be wrong.

Second, make sure you rerun the tools/test.py to evaluate the results after training is finished. The mAP and NDS reported during training are lower than normal values. Some of my colleagues reported that it could be related to the test_mode parameter in the dataset. If you change that to True during training then the mAP/NDS reported could match the separate evaluation results. I haven't tested it extensively but it seems to be worth trying.

For finetuning, I would suggest you to first start from the official checkpoint and the recommended training setting because I have experimented with that setting for multiple times and can guarantee that it is relatively easy to get the reported results.

Hi @VeeranjaneyuluToka,

I made a reply to your latest issue about the voxel size. Would you mind having a look at it?

Best, Haotian

GerhardArya commented 1 year ago

@kentang-mit I will try the first suggestion. It seems like that might help since my bounding boxes also have their center at [0.5, 0.5, 0.5]. I've just started a new LiDAR only training with the new values. I'll edit this reply later with the results.

Edit: For LiDAR only I managed to get 64.4 mAP and fusion managed to get 65.1 mAP (around 0.7 mAP increase with fusion). Considering the data I have (considerably more dense LiDAR than nuScenes and only 2 non-overlapping cameras), this seems to be around the best that I could do for now. So, I'm closing this issue for now.

For the second point, every mAP I reported comes from running tools/test.py. So I think this is fine in my case unless I misunderstood your suggestion.

For the third point, my latest result was using basically the recommended settings. I only changed point cloud range to what my dataset has, post center range to fit the new point cloud range, grid size and other params to fit the new point cloud range, and sample group of DB Sampler to match the class distribution in my dataset a bit better. Other than those, nothing was changed.