rasbt / machine-learning-book

Code Repository for Machine Learning with PyTorch and Scikit-Learn
https://sebastianraschka.com/books/#machine-learning-with-pytorch-and-scikit-learn
MIT License
3.6k stars 1.29k forks source link

Chapter 15 / page 531 - Building a character-level RNN - forward pass - wrong shapes #148

Closed lkrisz87 closed 1 year ago

lkrisz87 commented 1 year ago

The code is here: https://github.com/rasbt/machine-learning-book/blob/main/ch15/ch15_part3.ipynb

import torch.nn as nn

class RNN(nn.Module):
    def __init__(self, vocab_size, embed_dim, rnn_hidden_size):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embed_dim) 
        self.rnn_hidden_size = rnn_hidden_size
        self.rnn = nn.LSTM(embed_dim, rnn_hidden_size, 
                           batch_first=True)
        self.fc = nn.Linear(rnn_hidden_size, vocab_size)

    def forward(self, x, hidden, cell):
        out = self.embedding(x).unsqueeze(1)
        out, (hidden, cell) = self.rnn(out, (hidden, cell))
        out = self.fc(out).reshape(out.size(0), -1)
        return out, hidden, cell

    def init_hidden(self, batch_size):
        hidden = torch.zeros(1, batch_size, self.rnn_hidden_size)
        cell = torch.zeros(1, batch_size, self.rnn_hidden_size)
        return hidden.to(device), cell.to(device)

vocab_size = len(char_array)
embed_dim = 256
rnn_hidden_size = 512

torch.manual_seed(1)
model = RNN(vocab_size, embed_dim, rnn_hidden_size) 
model = model.to(device)
model
  1. The shape of x is (batch_size, seq_length) torch.Size([64, 40])

  2. output shape of the embedding is [batch_size, seq_length, embed_dim]

  3. it is unsqueezed at dim=1 => [batch_size, 1, seq_length, embed_dim]

  4. rnn / lstm doesn't accept 4 dimensional input tensor. Is this supposed to be [batch_size, seq_length, embed_dim] or [batch_size, seq_length, 1, embed_dim]

  5. the fc layer is feeded with the sequence output so it will produce a [batch_size, seq_length, ..., vocab_size] tensor?

What is happening here? I guess it was either overlooked chapter, or it was alright with an older version of pytorch, I don't know. I would like to ask some explanation. Until then, I'm going to find out what is going on here.

P.S. To the Author: Amazing Book!!!! So much thanks for writing it <3 <3

image image

lkrisz87 commented 1 year ago

torch version: '2.0.1+cu118'

rasbt commented 1 year ago

Hi there, thanks for posting this issue. I think this is probably a PyTorch 2.1 issue -- it recently worked with 1.10 (~3 months ago when it was last updated): https://github.com/rasbt/machine-learning-book/blob/main/ch15/ch15_part2.ipynb

lkrisz87 commented 1 year ago

Okay, it seems like I was mistaken here. I've assumed that the entire batch is going to get feeded to the model, and not trained character by character from the sequence. And I've tried to make the model eat the entire batch, but of course it didn't work because it was design to get batch x 1 char. It drove me crazy and I didn't read through the entire chapter...

rasbt commented 1 year ago

Glad this got resolved :)