Closed linhaojia13 closed 1 year ago
You're right and this seemingly non-intuitive design comes from group-free (See a related issue). We had tried changing it to what you suggested but I think it resulted in ~1% drop in performance.
Thank you for providing the relevant experimental results! I may explore this issue in the near future, so let's keep it in an open state for now.
In the bdetr.py, you set
base_xyz=cluster_xyz
(the original coordinates of these queries) for functionself.prediction_heads[i](...)
. The codes are shown as:However, for the
self.decoder[i](...)
, thequery_pos
is generated based on the outputbase_xyz
by previousself.seg_prediction_heads[i]
layer. The codes are shown as:The above codes for
self.prediction_heads[i]
implies that each prediction head at every layer is modifying the original queries coordinates. On the other hand, the codes forself.decoder[i](...)
indicates that each decoder layer is modeling the process of modifying the queries coordinates from the previous layer output. This suggests that the modeling processes in these two places are not consistent. I think parameters in the prediction head should be modified, specifically settingbase_xyz=base_xyz
inself.prediction_heads[i](...)
. What are your thoughts on my suggestion?