wangxiang1230 / OadTR

Code for our ICCV 2021 Paper "OadTR: Online Action Detection with Transformers".
MIT License
87 stars 12 forks source link

Question about configuration (num_heads, enc_layers) #23

Closed tghim closed 2 years ago

tghim commented 2 years ago

Dear authors.

First, thank you for sharing your nice work! It helps me a lot to solve my task.

Btw, your paper states that the best performance is achieved when the number of encoder layers and heads is 5 and 4, respectively, but in your code, they are set to 64 and 8, respectively. Could you explain why?

wangxiang1230 commented 2 years ago

Dear authors.

First, thank you for sharing your nice work! It helps me a lot to solve my task.

Btw, your paper states that the best performance is achieved when the number of encoder layers and heads is 5 and 4, respectively, but in your code, they are set to 64 and 8, respectively. Could you explain why?

Hi, thanks for your attention to OadTR. Actually, in our code, the number of decoder layers and heads is 5 and 4, you can see: "parser.add_argument('--decoder_layers', default=5, type=int, help="Number of decoder_layers")" (line 32-33) and "parser.add_argument('--decoder_num_heads', default=4, type=int, help="decoder_num_heads")" (line 40-41) in "config.py".

wangxiang1230 commented 2 years ago

Dear authors.

First, thank you for sharing your nice work! It helps me a lot to solve my task.

Btw, your paper states that the best performance is achieved when the number of encoder layers and heads is 5 and 4, respectively, but in your code, they are set to 64 and 8, respectively. Could you explain why?

"parser.add_argument('--enc_layers', default=64, type=int, help="Number of enc_layers")" refers to the features' temporal length of the input. And you can try running the entire pipeline.

tghim commented 2 years ago

Thank you for your quick reply. The number of encoder layers that was stated in your paper was 3, but I was mistaken for 5. So, you mean that "--enc_layers" indicates "T" in your paper?

wangxiang1230 commented 2 years ago

Thank you for your quick reply. The number of encoder layers that was stated in your paper was 3, but I was mistaken for 5. So, you mean that "--enc_layers" indicates "T" in your paper?

Yes, that's right.

tghim commented 2 years ago

Thank you for your answer!