spro / practical-pytorch

Go to https://github.com/pytorch/tutorials - this repo is deprecated and no longer maintained
MIT License
4.52k stars 1.11k forks source link

Should use term "iteration" instead of "epoch" in code #52

Open zeyu42 opened 7 years ago

zeyu42 commented 7 years ago

I thought an epoch is a thorough run through every training example in the training set. A single update to the parameters (whether with grads from a batch or just one example) should be called an iteration.

For example, in char-rnn-classification.ipynb, it's actually retrieving one training example per time. However, the index of loop is named epoch which I thought is improper.

aa1607 commented 7 years ago

Hi yes I'm a little confused about a few of things.

def train(input, target): ... for c in range(chunk_len): output, hidden = decoder(inp[c], hidden) loss += criterion(output, target[c])

like you're defining each training loop of the rnn to cycle individually through of charcters in the sequence. 1) Wouldnt that mean that a training loop is only a small part of the whole dataset and definitely not a whole epoch?

2) I noticed that on this page you changed the forward method to also have a non-unit batch dimension. https://github.com/spro/char-rnn.pytorch/blob/master/train.py . Is there any reason you went with batch_size = 1 in this tutorial ?

3) also i thought you didnt need to break up your sequence inputs to the rnn? Eg if i take out the for loop and just feed in the input:

def train(input, target): output, h = charnn(input, hidden)

the model doesnt return an error? would cycling a sequence at a time rather than a sequential unit at a time not work instead and be simpler?

If you cycle through each character individually as you've done, then does that mean that the model is any different to one that goes sequence by sequence ?

I was thinking that in an attention model the for loop might help you 'collect up' the hidden state at each timestep, since only the last hidden state is returned by default, but you're not applying attention here. So I cant think of a reason...

Thanks for any help, and I think your tutorials are fantastic.