thomasverelst / dynconv

Code for Dynamic Convolutions: Exploiting Spatial Sparsity for Faster Inference (CVPR2020)
https://arxiv.org/abs/1912.03203
126 stars 14 forks source link

question about the sparsity_target #10

Open albertszg opened 2 years ago

albertszg commented 2 years ago

Hello, this is brilliant work, I want to use the binary gumbel-softmax for my work. But there are some problems. I used the soft mask for the first layer only (just apply the generated mask to the features after the first layer),and I found a strange phenomenon。The gumbel noise seemed to influence the training process too much. I plotted the sparsity loss only, and I found I usually couldn't obtain the sparsity target I set. Is this process right? temp=5.0 微信截图_20211206151218 temp=1.0 later

thomasverelst commented 2 years ago

Some things that could help for convergence;

Not sure what your exact setup is, but be sure that the implementation is correct so that the gradients can backpropagate

albertszg commented 2 years ago

Thanks for your advice, I'll try them. And I find that adding a BatchNorm layer in the squeeze function is better