wei-tim / YOWO

You Only Watch Once: A Unified CNN Architecture for Real-Time Spatiotemporal Action Localization
Other
846 stars 158 forks source link

Regarding the trainable scalar parameter and learning rate #60

Closed alphadadajuju closed 4 years ago

alphadadajuju commented 4 years ago

Thank you very much for sharing your amazing work, codes and weights. These are not easily accessible for many researcher in this domain.

I do have a couple questions to inquire regarding your code / paper:

  1. I re-implemented the CFAM module following your code and explanation in the article, initializing the trainable scalar parameter alpha as 0. I noticed that at the end of my training, the alpha scalar value was still very small (< 0.1), which indicates that the CFAM module did not impose much effect on the original feature. Could you kindly verify if you observed similar phenomenon? What is your alpha value at the end?

  2. Even though the CFAM module introduces noticeable increase in the number trainable parameters (extra convolutions before and after CAM), the model's learning rate is initialized at a very low value (0.0001). Could you share why you set up in this way?

Thank you again!

okankop commented 4 years ago

Hi @alphadadajuju,

  1. Indeed trainable alpha parameter is very small. But I guess that is expected since alpha parameter determines how much information from other channels will be added to the original channels. If alpha becomes big, original features would get corrupted. This is not something we want since 2D+3D (without CFAM) already achieves good results.

  2. Because we are using pretrained models at both 2D and 3D backbones. Even for JHMDB, we are freezing the weights in these backbones. We have tried training with larger learning rates, but the gradients blown up leading the training not to converge.

I am happy to hear that you find our project useful.

alphadadajuju commented 4 years ago

I see, thank you very much for your prompt response!