char-rnn-classification exercise

transfluxus commented 6 years ago

Thanks for these tutorials. They are clear and easy to go through. As I am trying to get into building my own models with these tutorials I was trying to do the exercises in the char-rnn-classification notebook. I tried to increase the number of Linear Layers and the number of nodes in the hidden layer. Both adjustments lead to worse results tho.

This is how I created the model (for 3 Linear Layers)

    def __init__(self, input_size, hidden_size, output_size):
        super(RNN, self).__init__()

        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size

        self.i2h = nn.Linear(input_size + hidden_size, hidden_size)
        self.i2h2 = nn.Linear(hidden_size, hidden_size)
        self.i2h3 = nn.Linear(hidden_size, hidden_size)
        self.i2o = nn.Linear(input_size + hidden_size, output_size)
        self.i2o2 = nn.Linear(output_size, output_size)
        self.i2o3 = nn.Linear(output_size, output_size)
        self.softmax = nn.LogSoftmax()

    def forward(self, input, hidden):
        combined = torch.cat((input, hidden), 1)
        hidden = self.i2h(combined)
        hidden = self.i2h2(hidden)
        hidden = self.i2h3(hidden)
        output = self.i2o(combined)
        output = self.i2o2(output)
        output = self.i2o3(output)
        output = self.softmax(output)
        return output, hidden

Also, how would I use the LSTM as suggested? I've attached the plot of the original network and 6 different

same structure but 64 nodes on the hidden layer
2 Linear Layers with 128 nodes on the hidden layer
3 Linear Layers with 64 nodes on the hidden layer
4 Linear Layers with 128 nodes on the hidden layer
5 Linear Layers with 256 nodes on the hidden layer

Both more layers and the change of number nodes in the hidden layers decrease the resulting loss.

char_class

hunkim commented 6 years ago

Did you add a relu between linear layers?

        output = self.i2o(combined)
        output = self.i2o2(output)
        output = self.i2o3(output)

Also try dropout.

transfluxus commented 6 years ago

Thanks for answering. I tried different models including ReLU with a second layer. ReLU, dropouts with 2 or 3 layers and also more nodes (256) on the hidden layers. Nothing brought better results. There is no way to see what's going on in the network I guess :) This is the latest version:

        ...
        combined_size = input_size + hidden_size
        dropoutRate = 0.5

        self.hl = nn.Sequential(
            nn.Linear(combined_size, hidden_size),
            nn.ReLU(),
            nn.Dropout(dropoutRate),
            nn.Linear(hidden_size, hidden_size),
            nn.ReLU(),
            nn.Dropout(dropoutRate),
            nn.Linear(hidden_size, hidden_size)
        )

        self.ol = nn.Sequential(
            nn.Linear(combined_size, combined_size),
            nn.ReLU(),
            nn.Dropout(dropoutRate),
            nn.Linear(combined_size, combined_size),
            nn.ReLU(),
            nn.Dropout(dropoutRate),
            nn.Linear(combined_size, output_size),
            nn.LogSoftmax()
        )

    def forward(self, input, hidden):
        combined = torch.cat((input, hidden), 1)
        hidden = self.hl(combined)
        output = self.ol(combined)
        return output, hidden

hunkim commented 6 years ago

@transfluxus Loss graph for the new one?

transfluxus commented 6 years ago

sure. maybe it is just to complex so it would perform better on more complex tasks. or maybe it needs a different optimiser?

plot

transfluxus commented 6 years ago

First time I achieved slightly better results. But the training time is doubled.

So...


      def __init__(self, input_size, hidden_size, output_size):
        super(RNN, self).__init__()
        self.hidden_size = hidden_size
        combined_size = input_size + hidden_size
        dropoutRate = 0.2
        self.gl = nn.Sequential(
            nn.Linear(combined_size, combined_size),
        )

        self.hl = nn.Sequential(
            nn.ReLU(),
            nn.Dropout(dropoutRate),
            nn.Linear(combined_size, hidden_size),
        )

        self.ol = nn.Sequential(
            nn.ReLU(),
            nn.Dropout(dropoutRate),
            nn.Linear(combined_size, output_size),
            nn.LogSoftmax()
        )

    def forward(self, input, hidden):
        combined = torch.cat((input, hidden), 1)
        general = self.gl(combined)
        hidden = self.hl(general)
        output = self.ol(general)
        return output, hidden

plot

transfluxus commented 6 years ago

Ok lesson learned. NN are weird. Next I tried to use a LSTM this is my architecture:


    def __init__(self, input_size, hidden_size, output_size):
        super(RNN, self).__init__()
        self.hidden_size = hidden_size

        self.input_layer = nn.Linear(input_size, hidden_size)
        self.rnn_layer = nn.LSTM(hidden_size, hidden_size, 2, dropout=0.05)
        self.output_layer = nn.Linear(hidden_size, output_size)
        self.softener = nn.LogSoftmax()

    def forward(self, input_t, hidden_t):
        # print(input_t.size())
        input_t = self.input_layer(input_t)
        output_t, hidden_t = self.rnn_layer(input_t.unsqueeze(1),hidden_t)
        output_t = self.softener(self.output_layer(output_t.squeeze(1)))
        return output_t, hidden_t

It takes much more time and the performance is naja... plot Did somebody try a LSTM here?

YisongMiao commented 6 years ago

Thanks so much! Let me try it tmr!

spro / practical-pytorch

char-rnn-classification exercise #69