The projection head order needs to be be relooked

Akshay1-6180 commented 9 months ago

Going through these papers 1) https://arxiv.org/pdf/1603.05027.pdf 2) https://arxiv.org/pdf/2302.06112.pdf

class ProjectionHead(nn.Module):
    def __init__(
        self,
        embedding_dim,
        projection_dim=CFG.projection_dim,
        dropout=CFG.dropout
    ):
        super().__init__()
        self.projection = nn.Linear(embedding_dim, projection_dim)
        self.gelu = nn.GELU()
        self.fc = nn.Linear(projection_dim, projection_dim)
        self.dropout = nn.Dropout(dropout)
        self.layer_norm = nn.LayerNorm(projection_dim)

    def forward(self, x):
        projected = self.projection(x)
        x = self.gelu(projected)
        x = self.fc(x)
        x = self.dropout(x)
        x = x + projected
        x = self.layer_norm(x)
        return x

I feel the order should be this

class ProjectionHead(nn.Module):
    def __init__(
        self,
        embedding_dim,
        projection_dim=CFG.projection_dim,
        dropout=CFG.dropout
    ):
        super().__init__()
        self.projection = nn.Linear(embedding_dim, projection_dim)
        self.gelu = nn.GELU()
        self.fc = nn.Linear(projection_dim, projection_dim)
        self.dropout = nn.Dropout(dropout)
        self.layer_norm = nn.LayerNorm(projection_dim)

    def forward(self, x):
        projected = self.projection(x)
        x = self.layer_norm(projected)
        x = self.gelu(x)
        x = self.dropout(x)
        x = self.fc(x)
        x = x + projected

        return x

GewelsJI commented 9 months ago

Do you know why use GELU here? @Akshay1-6180

Akshay1-6180 commented 9 months ago

so based on experiments it was found that GELU has a significantly smoother gradient transition and its not abrupt or sharp like relu , if u look at both the functions u would understand. Moreover look at the GPT2 code , they use gelu and many other models i have encountered also use GELU so went with it.

https://github.com/openai/gpt-2/blob/master/src/model.py

moein-shariatnia / OpenAI-CLIP

The projection head order needs to be be relooked #17