Open HenryDo147 opened 5 years ago
Since we add BN layers before all convolution layers, the gradient should be stable in normal case. In this codebase, we simply add a BN layer at the begin of networks for data normalization. It works well on two datasets. I think the value scale is also not the reason for causing this problem. Is possible that ColorX and ColorY you mentioned are not effective in action recognition?
Code version (Git Hash) and PyTorch version
Dataset used
I used NTU‘s RGB-D dataset same as you, but I used ColorX and ColorY information, not 3 channels that you used.
Expected behavior
Actual behavior
I have faced vanishing gradient problem, have you ever faced this problem.
Steps to reproduce the behavior
Other comments