Open lcc815 opened 1 year ago
Sorry, I have made some simple attempts on Waymo dataset, but haven't obtain an outstanding results on Waymo dataset now. The experiments are done in LiDAR-only
condition, since I haven't found a suitable hyperparameter setting for DETR-like head.
Something interesting, we have conduct experiments using our private dataset, when dataset becomes larger, CMT head gains better results than Centerpoint head , even in LiDAR-only setting.
quite interesting. Any idea about this phenomenon?
I do not know why, but I think this may give us a chance in scaling up 3D perception models. Larger data and larger backbone.
The model infrastructure is totally transformer layers, there are many mature model parallel
techniques like PP
, TP
, FSDP
to use.
In camera-only 3D detection, models using VIT now obtains SoTA performance.
In LiDAR-only 3D detection, I have made some attempt in point cloud transformer architecture before, with each voxel as a token, naive transformer backbone + DETR head
obtains 52% mAP
, which is lower than 'centerpoint' with voxelnet + second
58%. Maybe there still remains some problem to be solved.
hi authors,
I am curious about the performance of the model on waymo dataset, but this was not mentioned in the paper. May I ask if you have conducted any relevant experiments and what were the results?
Thanks