zeliu98 / Group-Free-3D

Group-Free 3D Object Detection via Transformers
MIT License
243 stars 33 forks source link

Question about iterative object box prediction #15

Open XuyangBai opened 3 years ago

XuyangBai commented 3 years ago

Hi, thanks for your sharing. I find for each decoder layer, you use the cluster_xyz as the initial location instead of the updated base_xyz https://github.com/zeliu98/Group-Free-3D/blob/ef8b7bb5c3bf5b49b957624595dc6a642b6d0036/models/detector.py#L221-L227

My question is since each layer uses the box location of the previous layer to produce the spatial encoding, why does each layer predict the offset to the gt box location relative to the initial cluster center instead of the updated center of the previous layer? In another word, why not

base_xyz, base_size = self.prediction_heads[i](query,
                           base_xyz=base_xyz,                                               
                           end_points=end_points, 
                           prefix=prefix)

And under your setting, I think the "auxiliary loss" is necessary? The reason is that if no auxiliary loss is applied, the prediction head of the first N-1 decoder layers will not get supervision for the center_residual, the updated box prediction and spatial encoding for the next decoder layer will be meanless. Am I correct?

Best, Xuyang