Closed AdityaKane2001 closed 2 years ago
I agree. This is what I have personally followed in my implementations too.
Examples:
Great! In that case I will have a PR up by EoD today.
Do we want to have both StochasticDepth (TFAddons like) and DropBlock variant or only one of the two?
EDIT: I meant DropPath, not DropBlock.
Hi @sebastian-sz! Great suggestion.
This is a bit specific to Adtiya's contribution to GSoC. We haven't yet factored in DropBlock but that should be relatively easy to do. I don't mind including it. @AdityaKane2001 WDYT given DropBlock is more commonly used these days?
Just to clarify, DropBlock is not the same as DropPath or DropConnect, right?
It should not be.
@AdityaKane2001 yes, I meant DropPath not DropBlock. Sorry for the confusion.
@sayakpaul which one do we want to include?
I think StochasticDepth (dropping entire batch) and DropPath (dropping some samples) are more relevant and can be easily implemented here. If it's okay, I'll implement these two and have the PR ready by EoD. Then we can move on to DropBlock.
I'd prefer separate PRs for the two. I think that would make the review a bit more streamlined.
@AdityaKane2001 can we close it?
I was hoping to close it after we merge DropPath.
@sayakpaul @LukeWood
A continuation of #130, here I would like to clear some nitty-gritties regarding the implementation.
Abiding to the original paper, TFA has the most appropriate implementation (here) in my opinion. The timm and tensorflow/tpu implementations aim to randomly occlude some elements of every batch. However, if I understand correctly, the original paper aims to occlude entire batch, given by the probability. I believe this is the implementation we should go for.