torch / nn

Other
1.34k stars 967 forks source link

How to resume training from a certain epoch or how to update an already trained model in torch ? #1305

Open maryam089 opened 6 years ago

maryam089 commented 6 years ago

I have alexnet model which is trained on 100K images now i want to update this model by adding few thousand more images to it. But when i tried to load the model and start training, it gives me following error.... any help ????

/home/maryam/torch/install/bin/lua: /home/maryam/torch/install/share/lua/5.2/nn/Module.lua:327: check that you are sharing parameters and gradParameters stack traceback: [C]: in function 'assert' /home/maryam/torch/install/share/lua/5.2/nn/Module.lua:327: in function 'getParameters' train.lua:270: in main chunk [C]: in function 'dofile' ...ryam/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: in ?

kysunami commented 4 years ago

Check this out https://debuggercafe.com/effective-model-saving-and-resuming-training-in-pytorch/ You can first save the checkpoint and reload when your want to resume training. Hope this helps.