Open gdjmck opened 5 years ago
I remember the result of sharing weights is lower than original in paper. And my experiments indicate this conclusion too.
I remember the result of sharing weights is lower than original in paper. And my experiments indicate this conclusion too.
I checked the paper again and found out that was not the case. In the ablation study of the paper, a result list indicated that using weight sharing improved performance for various backbone networks.
What I really want to know is that how to update the network with different gradients for different branches and still keeping the parameters shared. Should it be average the gradient across branches or using a regularization method.
I read the tridentnet paper briefly and wondered if it should be using the same conv weight for different dilated conv kernels