youtubevos / MaskTrackRCNN

MaskTrackRCNN for video instance segmentation based on mmdetection
Apache License 2.0
431 stars 75 forks source link

How to change the baseline to support training per video? #25

Open zack624 opened 4 years ago

zack624 commented 4 years ago

Hello, thanks for your new task and baseline. I have read some papers from the top ranks in VIS competition, and I have found that most of them had improved the tracking part or post processed after image instance segmentation model. In contrast, I'd like to try utilizing spatial-temporal feature across frames, such as 3D CNN, feature aggregation, etc. But I have encounted some problems in the programming. Is it necessary to change the baseline(mmdetection) to support training multiple frames per video firstly, otherwise I have no idea to input multiple frames and aggregate features simultaneously, and resolve the scale inconsistency across videos. Thanks for reading.