Zeroing out the gradient before calling optimizer.step() in train_eval_seg.py

sacmehta / EdgeNets

This repository contains the source code of our work on designing efficient CNNs for computer vision

MIT License

412 stars 82 forks source link

Zeroing out the gradient before calling optimizer.step() in train_eval_seg.py #14

Closed mitalbert closed 5 years ago

mitalbert commented 5 years ago

Hi, thanks for publishing the code.

I noticed the optimizer.zero_grad() is called after calling optimizer.step() in train_eval_seg.py. Is this intentional? I was having issues until I moved zero_grad() above loss.backward().

Also when resuming the training there was a runtime error with type compatibility (cpu vs cuda) when calculating loss.backward() and adding model = model.cuda() right after loading the state dictionary in train_segmentation.py solved it.

sacmehta commented 5 years ago

Ideally it should not matter. Maybe something changed between PyTorch versions. If you are using this repository, please use either v1.0 or v1.1.

Thanks for the catch. Actually, we never tested resume argument. We will fix it.

mitalbert commented 5 years ago

Sure, I'm using 1.1.0

sacmehta commented 5 years ago

@mitalbert I have incorporated your suggestions. Please check if it resolves your issues.

mitalbert commented 5 years ago

If you mean the first issue in train_eval_seg.py, then it looks exactly like my fix with worked for me, so, yes.

If you mean the issue with resuming the training, unfortunately I can't test it now.