Potential bug in the transformer.py and detr_vae.py?

Hi, thanks for the amazing work! Recently, I tried to reimplement your code. However, I found something strange.

Especially, the forward of Transformer class in transformer.py only returns hs，which is a tensor with shape [Decoder block num, B, num_query, C]: https://github.com/tonyzhaozh/act/blob/742c753c0d4a5d87076c8f69e5628c79a8cc5488/detr/models/transformer.py#L77

However, in detr_vae.py, the action_head uses only hs[0] as input. This means the action is predicted based only on the first decorder feature with shape [1, B, num_query, C], but ignores all the other following features:

https://github.com/tonyzhaozh/act/blob/742c753c0d4a5d87076c8f69e5628c79a8cc5488/detr/models/detr_vae.py#L131 https://github.com/tonyzhaozh/act/blob/742c753c0d4a5d87076c8f69e5628c79a8cc5488/detr/models/detr_vae.py#L136

So, this is strange for me and I wonder if this is a bug. Hoping for your reply! Thanks in advance~

tonyzhaozh / act

Potential bug in the transformer.py and detr_vae.py? #25