The memory usage skyrocket each time it saves

ljyloo commented 7 years ago

And it doesn't free the memory. I executed this bash command

th train.lua --cuda --dataset 50000 --hiddenSize 1000

First epoch it consumed 2GiB Ram, and second it consumed 5GiB, then 10GiB and finally my memory was full at 11th epoch. (My computer have 32 GiB of ram)

This issue disappeared when I commented out line 156 to 171 in train.lua(The ram usage is always at 1.2GiB)

  if minMeanError == nil or errors:mean() < minMeanError then
    print("\n(Saving model ...)")
    params, gradParams = nil,nil
    collectgarbage()
    -- Model is saved as CPU
    model:float()
    torch.save("data/model.t7", model)
    collectgarbage()
    if options.cuda then
      model:cuda()
    elseif options.opencl then
      model:cl()
    end
    collectgarbage()
    minMeanError = errors:mean()
  end

So I conclude the saving process may be the problem

Namburgesas commented 7 years ago

Seems to occur in the calls to model:float(). My workaround was to just save in GPU format:

  if minMeanError == nil or errors:mean() < minMeanError then
    print("\n(Saving model ...)")
    params, gradParams = nil,nil
    collectgarbage()
    torch.save("data/model.t7", model)
    collectgarbage()
    minMeanError = errors:mean()
  end

I then added require 'cudnn' to the top of eval.lua in order to be able to load the saved model. If you want to save the model in CPU format, you could write a quick script to load the model, call model:float(), and save it again.

ljyloo commented 7 years ago

Thanks for your simple solution, @Namburgesas . Hope there's a fix in the future

biggerlambda commented 7 years ago

Did you try doing clearState() before using model:float(). It clears the intermediary states in the model (not needed for prediction)

macournoyer / neuralconvo

The memory usage skyrocket each time it saves #70