My question is since each layer uses the box location of the previous layer to produce the spatial encoding, why does each layer predict the offset to the gt box location relative to the initial cluster center instead of the updated center of the previous layer? In another word, why not
And under your setting, I think the "auxiliary loss" is necessary? The reason is that if no auxiliary loss is applied, the prediction head of the first N-1 decoder layers will not get supervision for the center_residual, the updated box prediction and spatial encoding for the next decoder layer will be meanless. Am I correct?
Hi, thanks for your sharing. I find for each decoder layer, you use the
cluster_xyz
as the initial location instead of the updatedbase_xyz
https://github.com/zeliu98/Group-Free-3D/blob/ef8b7bb5c3bf5b49b957624595dc6a642b6d0036/models/detector.py#L221-L227My question is since each layer uses the box location of the previous layer to produce the spatial encoding, why does each layer predict the offset to the gt box location relative to the initial cluster center instead of the updated center of the previous layer? In another word, why not
And under your setting, I think the "auxiliary loss" is necessary? The reason is that if no auxiliary loss is applied, the prediction head of the first N-1 decoder layers will not get supervision for the
center_residual
, the updated box prediction and spatial encoding for the next decoder layer will be meanless. Am I correct?Best, Xuyang