tonyzhaozh / act

MIT License
673 stars 162 forks source link

Potential bug in the transformer.py and detr_vae.py? #25

Open Facebear-ljx opened 5 months ago

Facebear-ljx commented 5 months ago

Hi, thanks for the amazing work! Recently, I tried to reimplement your code. However, I found something strange.

Especially, the forward of Transformer class in transformer.py only returns hs,which is a tensor with shape [Decoder block num, B, num_query, C]: https://github.com/tonyzhaozh/act/blob/742c753c0d4a5d87076c8f69e5628c79a8cc5488/detr/models/transformer.py#L77

However, in detr_vae.py, the action_head uses only hs[0] as input. This means the action is predicted based only on the first decorder feature with shape [1, B, num_query, C], but ignores all the other following features:

https://github.com/tonyzhaozh/act/blob/742c753c0d4a5d87076c8f69e5628c79a8cc5488/detr/models/detr_vae.py#L131 https://github.com/tonyzhaozh/act/blob/742c753c0d4a5d87076c8f69e5628c79a8cc5488/detr/models/detr_vae.py#L136

So, this is strange for me and I wonder if this is a bug. Hoping for your reply! Thanks in advance~

Facebear-ljx commented 5 months ago

I checked the official DETR code and found the transfomer in DETR returns a tuple [hs, memory]:

https://github.com/facebookresearch/detr/blob/29901c51d7fe8712168b8d0d64351170bc0f83e0/models/transformer.py#L59

So, hs[0] in their code uses all intermediate features rather than the first decorder feature.