zeliu98 / Group-Free-3D

Group-Free 3D Object Detection via Transformers
MIT License
243 stars 33 forks source link

Code Question #28

Open Giutear opened 2 years ago

Giutear commented 2 years ago

First of all, thank you for sharing your work. I've been working with your code recently, modifying a few sections and noticed a few things I don't quite understand.

In your paper you stated that you used a random scaling of 0.9 to 1.1 as augmentation on the Scannet data set, however the augmentation code provided only applies random flipping and rotation or did I miss the section where the scaling is applied?

Secondly I wasn't able to reproduce your results on the Scannet data set, coming 1% short on the mAP score @25 IoU and @50 IoU. Now I'm wondering if this may be due to the fact that I'm only using a single GPU for training? As far as I understand you do not sync the batch norm across GPUs and the batch on each GPU being smaller may actually beneficial to the training?

And when I looked at the transformer code, I noticed that each attention layer uses 288 dimensional features. I was wondering if there is a specific reason for choosing this value, as it seems quite low to me and I would have thought that a power of 2 would be more inline with most architectures.

I would really appreciate it if I could gain your insights on this.

yzheng97 commented 2 years ago

For the last question, I've tried 256-dim for ScanNet which does not affect the final results though.

Giutear commented 2 years ago

Thank you for your reply. Did you also try larger values such as 512?