Closed MinGiSa closed 1 month ago
The key difference is whether to activate the Dropout (p=0.2 by default). As discussed in the paper: " In Dinomaly, Dropout is used to discard neural activations in an MLP bottleneck randomly. Instead of alleviating overfitting, the role of Dropout in Dinomaly can be explained as feature noise and pseudo feature anomaly".
In your paper, you mentioned that the solution to identical mapping is reconstruction during restoration.
class Mlp(nn.Module): def init(self, in_features, hidden_features=None, out_features=None, act_layer=nn.GELU, drop=0.): super().init() out_features = out_features or in_features hidden_features = hidden_features or in_features self.fc1 = nn.Linear(in_features, hidden_features) self.act = act_layer() self.fc2 = nn.Linear(hidden_features, out_features) self.drop = nn.Dropout(drop)
(bottleneck): ModuleList( (0): bMlp( (fc1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate='none') (fc2): Linear(in_features=3072, out_features=768, bias=True) (drop): Dropout(p=0.2, inplace=False) ) )
This structure seems to be applied in the vanilla transformer as well, but I’m curious to know what the key differences are between this and the vanilla transformer. Did I reference the wrong part?