quark0 / darts

Differentiable architecture search for convolutional and recurrent networks
https://arxiv.org/abs/1806.09055
Apache License 2.0
3.92k stars 843 forks source link

Shouldn't we drop on the second dimension in the drop_path function? #143

Open woodywff opened 4 years ago

woodywff commented 4 years ago

This is the dropout function in utils.py:

def drop_path(x, drop_prob):
  if drop_prob > 0.:
    keep_prob = 1.-drop_prob
    mask = Variable(torch.cuda.FloatTensor(x.size(0), 1, 1, 1).bernoulli_(keep_prob))
    x.div_(keep_prob)
    x.mul_(mask)
  return x

Question: Why do we drop on the batchsize dimension (the 1st dimension)? Shouldn't we randomly keep and drop some of the filters (on the 2nd dimension)? Thank you :-)

Jasha10 commented 4 years ago

My understanding is that dropping from the second dimension would be an implementation of "drop channel", whereas dropping from the first dimension is "drop path".