Open Qi-Zhangyang opened 1 year ago
same question. In terms of MLPtune, the statement in the paper is inconsistent with the code, which one is correct? lightweight_mlp = nn.Sequential( nn.Linear(self.embed_dim//self.scale_factor, self.embed_dim//self.scale_factor), nn.GELU() )
and
About the MLPi tune, it is a shared MLP. In the paper, you say it has 32 layers. But in the code it seems only 1 layer.
self.shared_mlp = nn.Linear(self.embed_dim//self.scale_factor, self.embed_dim)
According to my understanding, MLPi tune is not shared MLP and it is 32 linear layers but MLP up is the shared layer which has one linear layer.
# Implementation for MLP up
self.shared_mlp = nn.Linear(self.embed_dim//self.scale_factor, self.embed_dim)
# Implementation for MLP tune
for i in range(self.depth):
lightweight_mlp = nn.Sequential(
nn.Linear(self.embed_dim//self.scale_factor, self.embed_dim//self.scale_factor),
nn.GELU()
)
setattr(self, 'lightweight_mlp_{}'.format(str(i)), lightweight_mlp)
These 32 layers are not used sequentially in the model.
and
About the MLPi tune, it is a shared MLP. In the paper, you say it has 32 layers. But in the code it seems only 1 layer.