Closed RG2806 closed 1 year ago
I think the code snippet you mentioned is specialized for nuScenes and is designed by the original authors of TransFusion. For a custom dataset, I would suggest you to get started with the CenterHead
, which has fewer parameters to be tuned.
Hey, I want to train a Fusion model. Would that be possible with centerhead. In your repo, I only see camera-only support. If possible, how should i proceed?
Yes, that is possible. Actually you only need to change the configurations of model.heads.object
to CenterHead configs. It will still work.
hey, thanks, for that. I'll experiment with it. Mean while can you tell about this 'tasks' in the centerhead config:
https://github.com/mit-han-lab/bevfusion/blob/main/configs/nuscenes/det/centerhead/default.yaml#L28
Sure. This is related to the CBGS paper. All classes are divided into different groups (and processed with different heads). The intuition is that the classes within each group usually have similar sizes.
For custom datasets, I think usually it will work if you just have one task head for all classes.
hey, thanks for clarifying. I made the changes like you said but i got empty bboxes on running evaluation. I tried to train a single class training. I am attaching the config.yaml and training log. 20221004_200442.log configs.txt PS: i don't have sweeps and map data so commented corresponding things from the data loading pipeline What am i doing wrong?
Hi @RG2806,
I actually mentioned in other issues previously that we follow a two-step training schedule. In the first stage we train a LiDAR-only model (CenterPoint or TransFusion will be OK). We then finetune the camera+LiDAR BEVFusion model.
I would highly recommend you to also use such a schedule.
Best, Haotian
Hey, I used a the CenterPoint like you suggested with the neck and backbone remaining same. I trained a lidar-only model, but during the training i noticed the loss didn't go down. This continued till 6th epoch after which the losses became nan and the weights of backbone also became nan. I am attaching the training log and config. Can you guide here?
Apart from this, during loading nuscenes point cloud, 'load_dim=5'. what are these 5 dimensions? Are they (x,y,x,intensity,ring) or (x,y,z,intensity, timestamp) or something else? Also are these normalized?
For the five dimensions, it means (x, y, z, intensity, timestamp). I believe that for intensity you usually just divide it by the maximum possible value and for timestamp we discretize it according to the relative frame index. For some datasets (such as Waymo), the maximum value of intensity can be very large, and in this case it is suggested to normalize this dimension using the tanh function.
I had a look at your log, and I think you can probably start from a training config without GT paste and CBGS.
Hey @kentang-mit, I did a few changes like, normalising intensity with tanh; removing gt paste and cbgs. Loss is reducing atleast for first epoch. Can you shed a bit more light on the GT Paste part? and how can i debug it?
I think GT paste is proposed in the SECOND paper from Yan et al., the idea is to create a database of all objects that appear in the training set and randomly paste some objects in the database onto each scene during training. GT paste has some improvement on nuScenes but I remember that it is small on the final fusion model (this is also observed by the authors of TransFusion).
To debug GT paste, I would suggest you visualize the point clouds (corresponding to these objects in the database) you generated and see whether they make sense (i.e. whether they look like real objects).
Hey @kentang-mit , Thanks for the clarification. I'll proceed accordingly. I have another question though. To fintetune the fusion model with lidar-only weights; do we use the 'load_from' key in the config. If so won't there be missing keys for camera layers?
Yes, you will see missing keys for camera layers, but it does not matter. These layers are initialized either from ImageNet-pretrained checkpoints or models pretrained on other tasks (e.g. 2D detection), which is specified in this field.
Thanks for the details, Can you shed a bit more detail on the test time augmentation and model ensemble that you mentioned in your paper. Are these augmentations same as in the config and how many copies you created if so? And when you say model ensemble which models did you use to generate predictions?
For TTA in the offboard settings, we used double flipping and rotation augmentations. For model ensemble, we use several models with different voxel resolutions and FPN architectures.
Thanks for the clarification and open sourcing this awesome work.
Hey @kentang-mit,
I wanted to train a lidar model on my custom dataset. In my dataset, all the objects are in front of the lidar, thus I to changed the point cloud range to [0, -54.0, -5.0, 108.0, 54.0, 3.0]. But with this change the model seems to error out during 1st epoch with cuda invalid configuration at hard_voxelise function. PS, what changes would i have to make to use dynamic voxelization instead hard voxelization
thanks
Dynamic voxelization would require some changes to the code and we actually did not use the version in mmdetection3d. Do you have more information about the error? For example, are you close to running out of memory?
Hey @kentang-mit
here is the error traceback:
Traceback (most recent call last):
File "tools/train.py", line 84, in
and i'm attaching my config file as well. I made changes to have objects only in front of lidar configs.txt
Thanks for the information @RG2806. Would you mind launching a separate terminal and run watch nvidia-smi
to see whether you are close to OOM on the GPU when running the code?
I will try to implement what you said. I also noticed one more thing during the training of the lidar-only model that the grad_norm always remains in 400 and does not go down. The change i made was, i reduced the lr to 1e-6 because i use 1 gpu, samples_per_gpu=1, workers_per_gpu=4. Apart from this i am attaching the log here. Can you help me understand why is this so.
I also noticed in other issues you mentioned that cordinate system of yours is different from mmdet, can you tell how?
@RG2806, there are more than one difference and I'm not sure I remember all of them:
lr=1e-6 looks clearly too small to me. So you can try increasing the learning rate.
@kentang-mit , can you clarify the following things for the coordinate system then, 1) point cloud coordinates system, is it same as the mmdet3d LiDAR Coordinate system ,i.e, x front, y left and z up
2) the gt bbox coordinate system, is it the same as the above? w, h, l corresponds to which directions yaw is zero on which axis? 3) will i have to do any other changes apart from the above. my dataset corresponds with new version of mmdet3d.
Hi @RG2806,
I will try my best to recall these details (as we implemented these parts maybe 7-8 months ago).
Your understanding of the coordinate system is correct and here is the visualization from the original mmdet3d repo we cloned around one year ago.
The conversion between yaw in our codebase (r) and in latest mmdetection3d (r1) is given by:
r = -np.pi / 2 - r1
you can derive the axis relationships according to the visualizations in the official mmdet3d repo here.
For box dimensions, we store them in the order of lwh, and I believe l=x_size, w=y_size, h=z_size. You may double check on that.
Hope that my explanations will be helpful. Here are also some tips for debugging. It will be helpful start from LiDAR-only models and see whether the results can match those in official papers (e.g. CenterPoint and TransFusion).
Thanks for the info
Hey @kentang-mit, My custom dataset, has more than11000 images and pcd. Because of this, the model is using too much ram, and gpu is relatively free. Can I migrate few of the steps to gpu if possible. And if so, where should i start?
Hi @RG2806, is it possible for you to ellaborate more on "using too much ram"? For example, are you trying to load everything into the memory first? Besides, what are the operations you'd like to move to the GPUs? For example, are these preprocessing operations or data loading operations during training?
Hi, This is interesting discussion, sorry for deviating a bit and am also working on towards the same (feeding the custom dataset). I used customized 3D-batch annotation tool (i modified myself to work with my own data) and my annotations are as below
**{"name":"000001","timestamp":0,"index":1,"labels":[{"id":0,"category":"sailboat","box3d":{"dimension":{"width":7.44,"length":6.765815099306366,"height":16.95},"location":{"x":45.24417122391511,"y":79.68626672516416,"z":6.632080000000001},"orientation":{"rotationYaw":0,"rotationPitch":0,"rotationRoll":0}}}]}
{"name":"000000","timestamp":0,"index":0,"labels":[{"id":0,"category":"sailboat","box3d":{"dimension":{"width":7.44,"length":3.54,"height":16.95},"location":{"x":42.9372728225865,"y":78.29214534179232,"z":6.63208},"orientation":{"rotationYaw":0,"rotationPitch":0,"rotationRoll":0}}}]}**
Am analysing further that how can i feed these annotations to BEVFusion (basically need to convert from this format to the format understandable by BEVFusion). Approach 1: converting these details or enhancing the above annotations to NuScenes annotations (like the database schema that they have). Approach 2: just create the input parameters that BEVFusion required.
Am still working on this, so would anyone of you recommend me some simple approach. @RG2806 , would be great help if you can describe here the approach that you followed to create annotations. @kentang-mit , would be really helpful if you can share any suggestions on this.
Hey @VeeranjaneyuluToka , My approach is similar to your first option. I created a new create_info script to create pickle files in nuscenes format and a separate dataset wrapper for my dataset with a different evaluation function.
Hi @VeeranjaneyuluToka,
Sorry for the late response. I was working on other projects recently. I agree with @RG2806 on the choice of Approach 1. In my opinion your annotation format looks pretty similar to nuScenes. You should be able to reuse the code starting from line 245 in this file. You could specify a zero velocity if there is no annotation.
Best, Haotian
@kentang-mit and @RG2806 , Thanks for your comments, I am a bit un-sure how to get ego_pose in my case. I had a look into the description given in nuscenes data-fromat section and it is as below Ego vehicle pose at a particular timestamp. Given with respect to global coordinate system of the log's map. The ego_pose is the output of a lidar map-based localization algorithm described in our paper. The localization is 2-dimensional in the x-y plane Am not sure if i understand it (esp lidar-based localization algo, tried to look into paper and understand but not figured out where exactly it is in paper), could you please give more details like how it can be computed in case of custom dataste?
Calibrated_sensor: I believe these are the camera and lidar calibration parameters in case if we use just camer and lidar. Extrinsics in case of both camera and lidar and intrinsics in case of camera. Is not it?
Hi @VeeranjaneyuluToka,
The reason why you want to have ego pose is to align LiDAR scans from multiple sweeps in the same coordinate system. If you start with single-frame LiDAR + camera, I think you do not need ego pose. If you really want to get the ego pose, I think you might need to consult the nuScenes team, I'm sorry that I'm not an expert in that. For extrinsics and intrinsics, your understanding is correct, basically they are used to obtain the camera->LiDAR transformation.
Best, Haotian
Closed due to inactivity. Please feel free to reopen if you feel it necessary.
@RG2806 , how did you solve the above memory error when you change the point cloud range?
Hey, Thanks for this open source work. I am training a fusion model on my custom dataset I wanted to understand the reasoning behind these extra max pooling layers and how do I know, if I need to do the same for my dataset classes. https://github.com/mit-han-lab/bevfusion/blob/main/mmdet3d/models/heads/bbox/transfusion.py#L248
same for this one https://github.com/mit-han-lab/bevfusion/blob/main/mmdet3d/models/heads/bbox/transfusion.py#L751