spro / practical-pytorch

Go to https://github.com/pytorch/tutorials - this repo is deprecated and no longer maintained
MIT License
4.52k stars 1.11k forks source link

weights update for "char-rnn-classification" #21

Closed WeiFoo closed 7 years ago

WeiFoo commented 7 years ago

In the "char-ran-classification" tutorial, the weights are updated by the following code.

  for p in rnn.parameters():
    p.data.add_(-learning_rate, p.grad.data)

I was trying to use optimizer.step() to update weights as the following:


optimizer = optim.SGD(rnn.parameters(), lr=0.005)
def train(category_tensor, line_tensor):

  hidden = rnn.initHidden()
  optimizer.zero_grad()
  for i in range(line_tensor.size()[0]):
    output, hidden = rnn(line_tensor[i], hidden)  
  loss.backward()
  optimizer.step()
return output, loss.data[0]

However, the results are very bad. most of the results are predicted wrong. I'm new to pytorch, and can any one explain what's the difference between these two methods?

Thanks

==================Results with optimizer.step()======================

5000  5% (0m12s) 2.9460 Jigalev / Portuguese ✗ (Russian)
10000  10% (0m24s) 2.9493 Gil / Portuguese ✗ (Korean)
15000  15% (0m36s) 2.9261 Ha / Spanish ✗ (Korean)
20000  20% (0m48s) 2.9530 Rog / English ✗ (Polish)
25000  25% (1m1s) 2.7749 Ki / Japanese ✓
30000  30% (1m13s) 2.8230 Messer / English ✗ (German)
35000  35% (1m25s) 2.9910 Paszek / English ✗ (Polish)
40000  40% (1m38s) 2.9733 Banh / English ✗ (Vietnamese)
45000  45% (1m50s) 2.7955 Serafim / Portuguese ✓
50000  50% (2m2s) 2.9550 Teng / English ✗ (Chinese)
55000  55% (2m15s) 2.7941 Kan / Portuguese ✗ (Chinese)
60000  60% (2m27s) 2.9335 Victors / Scottish ✗ (French)
65000  65% (2m39s) 2.7903 Lobo / Portuguese ✓
70000  70% (2m51s) 2.7830 Soga / Japanese ✓
75000  75% (3m3s) 2.9486 Hong / English ✗ (Chinese)
80000  80% (3m16s) 2.9172 Brisimitzakis / Portuguese ✗ (Greek)
85000  85% (3m27s) 2.9579 Linville / Japanese ✗ (French)
90000  90% (3m40s) 2.9035 Kerr / Japanese ✗ (Scottish)
95000  95% (3m52s) 2.8205 Tsen / Portuguese ✗ (Chinese)
100000  100% (4m4s) 2.8934 Kenmotsu / Greek ✗ (Japanese)

==================Results using the method in the tutorial======================

5000  5% (0m12s) 2.0740 Trapani / Italian ✓
10000  10% (0m26s) 2.1635 Rzehak / Czech ✓
15000  15% (0m39s) 2.7131 Bishara / Japanese ✗ (Arabic)
20000  20% (0m53s) 0.8122 Villamov / Russian ✓
25000  25% (1m7s) 2.0739 Mercier / German ✗ (French)
30000  30% (1m20s) 0.8251 Isozaki / Japanese ✓
35000  35% (1m33s) 2.4339 Cumming / Italian ✗ (English)
40000  40% (1m46s) 0.1408 Mckenzie / Scottish ✓
45000  45% (1m59s) 0.9425 Menendez / Spanish ✓
50000  50% (2m13s) 0.0060 Haritopoulos / Greek ✓
55000  55% (2m26s) 0.6606 Zientek / Polish ✓
60000  60% (2m40s) 2.2221 Desjardins / Greek ✗ (French)
65000  65% (2m53s) 1.3020 Seaghdha / Irish ✓
70000  70% (3m6s) 2.5087 Meier / French ✗ (Czech)
75000  75% (3m19s) 0.8131 Dasios / Greek ✓
80000  80% (3m31s) 0.2416 Poggi / Italian ✓
85000  85% (3m44s) 0.5777 Kim / Korean ✓
90000  90% (3m57s) 2.6680 See  / Dutch ✗ (Chinese)
95000  95% (4m10s) 2.0200 Weisener / German ✗ (Czech)
100000  100% (4m23s) 1.5813 O'Gorman / French ✗ (Irish)
spro commented 7 years ago

This is because of a mistake where the RNN is initialized twice (once at the top, once below creating the optimizer) - so optimizer is optimizing parameters that aren't being used. Commit 0cc55f5aaed44e7903edb8842671411301fcf003 should fix it.

WeiFoo commented 7 years ago

where the RNN is initialized twice (once at the top, once below creating the optimizer)

I don't understand this.

I checked the commit 0cc55f5, it seems that is same as what I did above using optimizer. The resutls are the same, only very limited correct predictions. most are wrong.

spro commented 7 years ago

The problem is lower down, around https://github.com/spro/practical-pytorch/commit/0cc55f5aaed44e7903edb8842671411301fcf003#diff-e9a91a525ccafb52f5b1e35131d4011cL51

The RNN was being re-created after the optimizer:

rnn = RNN(n_letters, n_hidden, n_categories) # rnn 1
optimizer = torch.optim.SGD(rnn.parameters(), lr=learning_rate) # Using rnn 1's parameters

def train():
    ...
    rnn(...) # This is going to refer to the rnn 2, because of below
    optimizer(...) # This still has parameters of rnn 1
    ...

rnn = RNN(n_letters, n_hidden, n_categories) # rnn 2 causes the problem, delete this

So the optimizer was not working, because rnn was redefined to and has a completely new set of parameters, while the optimizer has a reference to the old one.

WeiFoo commented 7 years ago

cool, thanks!! The mistake in the official Classifying Names with a Character-Level RNN tutorial should be fixed as well.

import time
import math

n_epochs = 100000
print_every = 5000
plot_every = 1000

rnn = RNN(n_letters, n_hidden, n_categories)

# Keep track of losses for plotting
current_loss = 0
all_losses = []