Thank you for you so great work.
I am currently training FSDv2 on my own dataset, there are still some issues about the training loss and numbers of vitrual voxels.
My dataset which is very similar to WOD, but .it collect by Pandar128. I just lightly modify the config fsdv2_waymo_2x.py to adapt my own datasets.
There are
1. the point range is changed from **[-80, -80, -2, 80, 80, 4] to [-110, -50.4, -3, 110,50,4 5]**
2. since there are only 70000+ frames dataset, I try repeat it as 2 times
3. use 8 * V100 to train the model
4. Don't use copy-paste.
after 24epochs, the performance on training set is not expected. Specially on the range > 60m and < 100m @IOU 0.7, the AP for VEH is only 0.25, which is lower than CenterPoint baseline(may be 0.4)
and the training log shows that:
1. The center loss of VEH is **about 0.45~**, but I saw in your waymo training logs it is **about 0.2**
2. The numbers of virtual voxel is about 20000+, which in WOD is about 2000+
I noticed the biggest predicted error is maybe from center of the bbox ( I have try to replace the predicted TP's center by GT), but the center loss seems converge at 0.45.
Additional the distribution of my data is roughly 3:1 (near-field: far-field(>60m) , and most are urban scenarios. Continuous frame annotation (10Hz). I am not clear about how much impact the data will have.
Is that something wrong in my training or Could u give me some suggestions to improve it ?
Thank you for you so great work. I am currently training FSDv2 on my own dataset, there are still some issues about the training loss and numbers of vitrual voxels. My dataset which is very similar to WOD, but .it collect by Pandar128. I just lightly modify the config fsdv2_waymo_2x.py to adapt my own datasets. There are
after 24epochs, the performance on training set is not expected.
Specially on the range > 60m and < 100m @IOU 0.7, the AP for VEH is only 0.25, which is lower than CenterPoint baseline(may be 0.4) and the training log shows that:
I noticed the biggest predicted error is maybe from center of the bbox ( I have try to replace the predicted TP's center by GT), but the center loss seems converge at 0.45. Additional the distribution of my data is roughly 3:1 (near-field: far-field(>60m) , and most are urban scenarios. Continuous frame annotation (10Hz). I am not clear about how much impact the data will have.
Is that something wrong in my training or Could u give me some suggestions to improve it ?