torch / optim

A numeric optimization package for Torch.
Other
197 stars 152 forks source link

Optim does not update weights on big MLP network #164

Closed viktorheli closed 7 years ago

viktorheli commented 7 years ago

I try training network for regression task with optim.sgd. But I see strange thing. If I add to network > 12-16 layers to my MLP, optim does not change weights and network does not learning. Network begin learning if I decrease number of layers. But in strange cases network with 16 layers begun learning with learning rate 2 or above. Network with 24 layers does not learning with learning rate 100 or above.

This behavior of "optim" very strange for me. But maybe I do not understanding simple things.

My code:

require ('torchx')
require ('paths')
cjson = require 'cjson'
require 'io'
require 'nn'
require 'optim'
require 'cunn'
require 'cutorch'

torch.setdefaulttensortype('torch.FloatTensor')

--cmd line arg
cmd = torch.CmdLine()
cmd:text()
cmd:text('Training neural networks. By default train neural network for 24 hour prediction')
cmd:text('Example:')
cmd:text('$> th pattern-make-train-optim.lua -dataset "path to dataset" -storenet "path to store you net" -saveevery 1000')
cmd:text('All options:')
cmd:option('-dataset', 'simple-bug-dataset.t7', 'Path to load dataset')
cmd:option('-storenet', 'simple-bug.dat', 'Path to saving or loading neuralnet')
cmd:option('-train', '2000', 'Numbers of train iterations')
cmd:option('-learningrate', '0.01', 'learning rate for SGD algorithm')
cmd:option('-saveevery', '100000', 'Save temporal net every N "epoch\'s"')
cmd:option('-valid', '200', 'Do validation on dataset every N epochs and display min max and average error')
cmd:option('-progress', 'yes', 'Display xlua progress bar "yes" or "no"')                                                                                                                              
cmd:option('-momentum', '0', 'Momentum for changing learningrate')                                                                                                                                     
opt = cmd:parse(arg or {})                                                                                                                                                                             

--calculate error on dataset. validation function (not real validation)                                                                                                                                

function validation()                                                                                                                                                                                  

        dsize = dataset.inputs:size(1)                                                                                                                                                                 
        errormatrix = {}                                                                                                                                                                               

        for i = 1, dsize/10 do                                                                                                                                                                         

                permutation = torch.random(dsize)                                                                                                                                                      

                fwd = mlp:forward(dataset.inputs[permutation])                                                                                                                                         
                predict = (fwd)  
                real = (dataset.outputs[permutation])
                erorrpercent = math.abs((((predict[1]/real[1])-1)*100))

                table.insert(errormatrix, erorrpercent)

        end
        min = torch.min(torch.Tensor(errormatrix))
        max = torch.max(torch.Tensor(errormatrix))
        mean = torch.mean(torch.Tensor(errormatrix))

        print("\n".."Min error, %: "..min.."\n".."Max error, %:  "..max.."\n".."Average error, %: "..mean.."\n")
end

if (paths.filep(opt.storenet) == true) then

                print("Loading net file:        "..opt.storenet)
                mlp = torch.load(opt.storenet)

        else

                print("Creating net for traning")

--This MPL not learning because BUG in optim
                mlp = nn.Sequential()
                mlp:add(nn.Linear(28, 56))
                mlp:add(nn.Sigmoid())
                mlp:add(nn.Linear(56, 58))
                mlp:add(nn.Sigmoid())
                mlp:add(nn.Linear(58, 112))
                mlp:add(nn.Sigmoid())
                mlp:add(nn.Linear(112, 114))
                mlp:add(nn.Sigmoid())
                mlp:add(nn.Linear(114, 224))
                mlp:add(nn.Sigmoid())
                mlp:add(nn.Linear(224, 226))
                mlp:add(nn.Sigmoid())
                mlp:add(nn.Linear(226, 448))
                mlp:add(nn.Sigmoid())
                mlp:add(nn.Linear(448, 450))
                mlp:add(nn.Sigmoid())
                mlp:add(nn.Linear(450, 224))
                mlp:add(nn.Sigmoid())
                mlp:add(nn.Linear(224, 112))
                mlp:add(nn.Sigmoid())
                mlp:add(nn.Linear(112, 56))
                mlp:add(nn.Sigmoid())
                mlp:add(nn.Linear(56, 7))
                mlp:add(nn.Tanh())

--[[
--This mlp learning with learningrate 2 

                mlp = nn.Sequential()
                mlp:add(nn.Linear(28, 56))
                mlp:add(nn.Sigmoid())
                mlp:add(nn.Linear(56, 58))
                mlp:add(nn.Sigmoid())
                mlp:add(nn.Linear(58, 112))
                mlp:add(nn.Sigmoid())
                mlp:add(nn.Linear(112, 224))
                mlp:add(nn.Sigmoid())
                mlp:add(nn.Linear(224, 224))
                mlp:add(nn.Sigmoid())
                mlp:add(nn.Linear(224, 112))
                mlp:add(nn.Sigmoid())
                mlp:add(nn.Linear(112, 56))
                mlp:add(nn.Sigmoid())
                mlp:add(nn.Linear(56, 7))
                mlp:add(nn.Tanh())
--]]                
               print (mlp)

end --this end for if for mlp

dataset = torch.load(opt.dataset)

criterion = nn.MSECriterion()
params, gradParams = mlp:getParameters()
optimState = {learningRate = opt.learningrate, momentum = opt.momentum}

for epoch = 1, opt.train do
        if (opt.progress == "yes" ) then

                xlua.progress(epoch, opt.train)
        end
        function feval(params)
                gradParams:zero()
                outputs = mlp:forward(dataset.inputs)
                loss = criterion:forward(outputs, dataset.outputs)
                dloss_doutputs = criterion:backward(outputs, dataset.outputs)
                mlp:backward(dataset.inputs, dloss_doutputs)
                return loss, gradParams
        end

        fs = optim.sgd(feval, params, optimState)

        if  epoch % opt.saveevery  == 0 then
                print("Number of iteration: "..epoch)
                print("Saving nempotary model to: "..opt.storenet.."temporal")
                torch.save(opt.storenet.."temporal", mlp)

        end

        if  epoch % opt.valid  == 0 then
--              validation()
                epochloss = fs[1] / dataset.outputs:size(1)
                print("\n"..epochloss*1000)
        end
end

print("Saving model to: "..opt.storenet)
torch.save(opt.storenet, mlp)

Dataset for test: https://www.dropbox.com/s/deom263k4zk14ur/simple-bug-dataset.t7?dl=0

Big thanks for help.

viktorheli commented 7 years ago

For example: th simple-bug.lua -valid 20 -train 100000 -learningrate 2 -progress no Creating net for traning nn.Sequential { [input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> (21) -> (22) -> (23) -> (24) -> output] (1): nn.Linear(28 -> 56) (2): nn.Sigmoid (3): nn.Linear(56 -> 58) (4): nn.Sigmoid (5): nn.Linear(58 -> 112) (6): nn.Sigmoid (7): nn.Linear(112 -> 114) (8): nn.Sigmoid (9): nn.Linear(114 -> 224) (10): nn.Sigmoid (11): nn.Linear(224 -> 226) (12): nn.Sigmoid (13): nn.Linear(226 -> 448) (14): nn.Sigmoid (15): nn.Linear(448 -> 450) (16): nn.Sigmoid (17): nn.Linear(450 -> 224) (18): nn.Sigmoid (19): nn.Linear(224 -> 112) (20): nn.Sigmoid (21): nn.Linear(112 -> 56) (22): nn.Sigmoid (23): nn.Linear(56 -> 7) (24): nn.Tanh }

Number of iteration: 20 Epochloss: -0.83493030276792

Number of iteration: 40 Epochloss: -0.83493030276792

Number of iteration: 60 Epochloss: -0.83493030276792

Number of iteration: 80 Epochloss: -0.83493030276792

Number of iteration: 100 Epochloss: -0.83493030276792

Number of iteration: 120 Epochloss: -0.83493030276792

Number of iteration: 140 Epochloss: -0.83493030276792

Number of iteration: 160 Epochloss: -0.83493030276792

Number of iteration: 180 Epochloss: -0.83493030276792

Number of iteration: 200 Epochloss: -0.83493030276792

As you can see, Epochloss does not change absolutely.

ProGamerGov commented 7 years ago

Did you ever figure out this issue? Because I may have come across a related issue that involves working with extremely large image sizes.

viktorheli commented 7 years ago

Made several tests. The network is learning, but with the default settings very very slow. The problem is not in Optim. Probably I have a small data set. Total ~ 150-700 counts. Increasing learningrate - network can be trained. Thank you all for your help.