Open 9p15p opened 4 years ago
@9p15p Yes, memory also requires gradients. In specific, we learn how to encode a memory from [frame, mask]. Naturally, every feed-forward operation is done without torch.no_grad
during training.
Thank you for your reply. I have another question: how should we backpropagate our loss ?IOW, where should we put our "loss.backward"? I have 3 strategies:
calculate loss and put "loss.backward" in every object every frame.(in experiment, we will use it with 'retain_graph=True' twice for the last two frames and after we calculate two frames' loss we use another one without 'retain_graph=True'.
calculate loss in every object every frame but only put "loss.backward" after we calculate all frames' loss.
only calculate loss for the final(third) mask, and ignore the middle(second) mask, and only use one "loss.backward()" in the end.
Maybe, none of these three strategies is right. Maybe, my "inf" is improper, we should train all object in the same time.
looking forward to your advice!~ thank you!
What I did is option 2. We sum-up all the losses and call backward and step at the end of the iteration.
Thank you, Sir. Although,my best reimplemention is still not satisfactory. But in my own experiment, calling loss back every frame has a better performance and Converges quicker. (use 'retain_graph=True' for the second frame's loss.).I will try again.
best wishes!
Hi, sir! Thank you for your fine work, but I still have some question.