Closed tarvaina closed 1 year ago
@tarvaina right, the place to add dropout is correct. I haven't been using it for a while, but it should improve performance as in original lua implementation - very little changed in python.
@szagoruyko Sorry, still a little confused about that. As stated in the original paper, "We add a dropout layer into each residual block between convolutions as shown in fig. 1(d) and after ReLU to perturb batch normalization in the next residual block and prevent it from overfitting." Does the aforementioned lines 57 and 58 in resnet.py means the "dropout layer between convolutions"? So why "after RELU"? Just from the statement, it should be conjectured that the dropout layer is inserted between the second RELU and second conv in each residual block added in lines 58 and 59 in resnet.py as this latest pull request.
PyTorch implementation does not seem to include dropout layers. Is there anything special that needs to be considered when adding them? I guess it would be a simple addition between lines 57 and 58 in resnet.py.
From the paper I understood that dropout would improve performance. Did I understand correctly, and is that still the case according to your current experience?