Open transfluxus opened 6 years ago
Did you add a relu between linear layers?
output = self.i2o(combined)
output = self.i2o2(output)
output = self.i2o3(output)
Also try dropout.
Thanks for answering. I tried different models including ReLU with a second layer. ReLU, dropouts with 2 or 3 layers and also more nodes (256) on the hidden layers. Nothing brought better results. There is no way to see what's going on in the network I guess :) This is the latest version:
...
combined_size = input_size + hidden_size
dropoutRate = 0.5
self.hl = nn.Sequential(
nn.Linear(combined_size, hidden_size),
nn.ReLU(),
nn.Dropout(dropoutRate),
nn.Linear(hidden_size, hidden_size),
nn.ReLU(),
nn.Dropout(dropoutRate),
nn.Linear(hidden_size, hidden_size)
)
self.ol = nn.Sequential(
nn.Linear(combined_size, combined_size),
nn.ReLU(),
nn.Dropout(dropoutRate),
nn.Linear(combined_size, combined_size),
nn.ReLU(),
nn.Dropout(dropoutRate),
nn.Linear(combined_size, output_size),
nn.LogSoftmax()
)
def forward(self, input, hidden):
combined = torch.cat((input, hidden), 1)
hidden = self.hl(combined)
output = self.ol(combined)
return output, hidden
@transfluxus Loss graph for the new one?
sure. maybe it is just to complex so it would perform better on more complex tasks. or maybe it needs a different optimiser?
First time I achieved slightly better results. But the training time is doubled.
So...
def __init__(self, input_size, hidden_size, output_size):
super(RNN, self).__init__()
self.hidden_size = hidden_size
combined_size = input_size + hidden_size
dropoutRate = 0.2
self.gl = nn.Sequential(
nn.Linear(combined_size, combined_size),
)
self.hl = nn.Sequential(
nn.ReLU(),
nn.Dropout(dropoutRate),
nn.Linear(combined_size, hidden_size),
)
self.ol = nn.Sequential(
nn.ReLU(),
nn.Dropout(dropoutRate),
nn.Linear(combined_size, output_size),
nn.LogSoftmax()
)
def forward(self, input, hidden):
combined = torch.cat((input, hidden), 1)
general = self.gl(combined)
hidden = self.hl(general)
output = self.ol(general)
return output, hidden
Ok lesson learned. NN are weird. Next I tried to use a LSTM this is my architecture:
def __init__(self, input_size, hidden_size, output_size):
super(RNN, self).__init__()
self.hidden_size = hidden_size
self.input_layer = nn.Linear(input_size, hidden_size)
self.rnn_layer = nn.LSTM(hidden_size, hidden_size, 2, dropout=0.05)
self.output_layer = nn.Linear(hidden_size, output_size)
self.softener = nn.LogSoftmax()
def forward(self, input_t, hidden_t):
# print(input_t.size())
input_t = self.input_layer(input_t)
output_t, hidden_t = self.rnn_layer(input_t.unsqueeze(1),hidden_t)
output_t = self.softener(self.output_layer(output_t.squeeze(1)))
return output_t, hidden_t
It takes much more time and the performance is naja... Did somebody try a LSTM here?
Thanks so much! Let me try it tmr!
Thanks for these tutorials. They are clear and easy to go through. As I am trying to get into building my own models with these tutorials I was trying to do the exercises in the char-rnn-classification notebook. I tried to increase the number of Linear Layers and the number of nodes in the hidden layer. Both adjustments lead to worse results tho.
This is how I created the model (for 3 Linear Layers)
Also, how would I use the LSTM as suggested? I've attached the plot of the original network and 6 different
Both more layers and the change of number nodes in the hidden layers decrease the resulting loss.