Open albertszg opened 2 years ago
Some things that could help for convergence;
Not sure what your exact setup is, but be sure that the implementation is correct so that the gradients can backpropagate
Thanks for your advice, I'll try them. And I find that adding a BatchNorm layer in the squeeze function is better
Hello, this is brilliant work, I want to use the binary gumbel-softmax for my work. But there are some problems. I used the soft mask for the first layer only (just apply the generated mask to the features after the first layer),and I found a strange phenomenon。The gumbel noise seemed to influence the training process too much. I plotted the sparsity loss only, and I found I usually couldn't obtain the sparsity target I set. Is this process right? temp=5.0 temp=1.0