By referring tpn_train.py, I generated training data for RNN training. In your implementation, the ground truth tubelets for RNN training are generated by Fast RCNN: Given proposals RoI of first frame, infer RoI in the next frame, the rest RoIs in a tubelet are generated one by one with Fast RCNN. The class label of roi is equal to the label of max overlap gt bbox, the location target for RNN is the offset between predict RoI and ground truth bbox.
However, the whole process doesn't refer to the trained Tubelets Proposal Networs(TPN), which is used to generate sequence proposals(tubelets) and extract classification features for ED-LSTM described in your paper.
I trained Bi-LSTM model with your implementation, the configures are as follow:
[tpn_train]
param hkbn_4d_fast_rcnn_vid_only_iter_90000.caffemodel, length 20, num_per_batch 300, track_per_vid 300
[Bi-LSTM]
max_step 20, max_epoch 20, iter_epoch 2000, vid_per_batch 4
@myfavouritekk I find that the training datas of RNN generated by Fast RCNN are not suitable: only use current RoI and Fast RCNN to predict RoI of next frame is underperforming, especial for the latter part frames of a tubelet. Thus, it hard to train a RNN to predict class and location well. I think it's better to use tubelets proposals generated by trained TPN as RoI rather than the RoI generated by Fast RCNN.
By referring tpn_train.py, I generated training data for RNN training. In your implementation, the ground truth tubelets for RNN training are generated by Fast RCNN: Given proposals RoI of first frame, infer RoI in the next frame, the rest RoIs in a tubelet are generated one by one with Fast RCNN. The class label of roi is equal to the label of max overlap gt bbox, the location target for RNN is the offset between predict RoI and ground truth bbox.
However, the whole process doesn't refer to the trained Tubelets Proposal Networs(TPN), which is used to generate sequence proposals(tubelets) and extract classification features for ED-LSTM described in your paper. I trained Bi-LSTM model with your implementation, the configures are as follow: [tpn_train] param hkbn_4d_fast_rcnn_vid_only_iter_90000.caffemodel, length 20, num_per_batch 300, track_per_vid 300 [Bi-LSTM] max_step 20, max_epoch 20, iter_epoch 2000, vid_per_batch 4
@myfavouritekk I find that the training datas of RNN generated by Fast RCNN are not suitable: only use current RoI and Fast RCNN to predict RoI of next frame is underperforming, especial for the latter part frames of a tubelet. Thus, it hard to train a RNN to predict class and location well. I think it's better to use tubelets proposals generated by trained TPN as RoI rather than the RoI generated by Fast RCNN.