guojiajeremy / Dinomaly

54 stars 5 forks source link

questions about MLP - drop out. #11

Closed MinGiSa closed 1 month ago

MinGiSa commented 1 month ago

In your paper, you mentioned that the solution to identical mapping is reconstruction during restoration.

class Mlp(nn.Module): def init(self, in_features, hidden_features=None, out_features=None, act_layer=nn.GELU, drop=0.): super().init() out_features = out_features or in_features hidden_features = hidden_features or in_features self.fc1 = nn.Linear(in_features, hidden_features) self.act = act_layer() self.fc2 = nn.Linear(hidden_features, out_features) self.drop = nn.Dropout(drop)

def forward(self, x):
    x = self.fc1(x)
    x = self.act(x)
    x = self.drop(x)
    x = self.fc2(x)
    x = self.drop(x)
    return x

(bottleneck): ModuleList( (0): bMlp( (fc1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate='none') (fc2): Linear(in_features=3072, out_features=768, bias=True) (drop): Dropout(p=0.2, inplace=False) ) )

This structure seems to be applied in the vanilla transformer as well, but I’m curious to know what the key differences are between this and the vanilla transformer. Did I reference the wrong part?

guojiajeremy commented 1 month ago

The key difference is whether to activate the Dropout (p=0.2 by default). As discussed in the paper: " In Dinomaly, Dropout is used to discard neural activations in an MLP bottleneck randomly. Instead of alleviating overfitting, the role of Dropout in Dinomaly can be explained as feature noise and pseudo feature anomaly".