questions about MLP - drop out.

In your paper, you mentioned that the solution to identical mapping is reconstruction during restoration.

class Mlp(nn.Module): def init(self, in_features, hidden_features=None, out_features=None, act_layer=nn.GELU, drop=0.): super().init() out_features = out_features or in_features hidden_features = hidden_features or in_features self.fc1 = nn.Linear(in_features, hidden_features) self.act = act_layer() self.fc2 = nn.Linear(hidden_features, out_features) self.drop = nn.Dropout(drop)

def forward(self, x):
    x = self.fc1(x)
    x = self.act(x)
    x = self.drop(x)
    x = self.fc2(x)
    x = self.drop(x)
    return x

(bottleneck): ModuleList( (0): bMlp( (fc1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate='none') (fc2): Linear(in_features=3072, out_features=768, bias=True) (drop): Dropout(p=0.2, inplace=False) ) )

This structure seems to be applied in the vanilla transformer as well, but I’m curious to know what the key differences are between this and the vanilla transformer. Did I reference the wrong part?

guojiajeremy / Dinomaly

questions about MLP - drop out. #11