Closed 1132520084 closed 8 years ago
The difference is small but important. The intuition goes as follows: during training we provide the network a sensor input x up to some point t and then instead of trying to directly predict desired output y at time t we make the network to imagine the expected output a few timesteps later at time t+n (i.e. we make the network to imagine the expected future). It turns out that when the network gets good at this "imagination task" it performs well with tracking the objects in occlusions as both of these tasks require learned "imagination capability".
Equation 7 is used during the unsupervised training as well but with a slight modification that we penalise the network based on how good it is at imagining the visible part of future output. The following 2 pictures showing the difference should help:
Is this more clear?
Thanks for the explanation, but it seems to me that the network should learn to predict the future input x{t+n} rather than the binary mask y{t+n}. Why does it work? There seems no such supervision at all.
I have read your paper "Deep Tracking: Seeing Beyond Seeing Using Recurrent Neural Networks". but I don't understand the approach called “unsupervised training”. In the section “Unsupervised Training”, you said "we propose to train the network not to predict the current state , but a state in the future" , can you tell me the difference between them ? Thank you. And is the Equation 7 used in the unsupervised training ? Thank you.