matterport / Mask_RCNN

Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow
Other
24.37k stars 11.65k forks source link

Combine Mask RCNN with LSTM for Single Object Tracking #342

Open cechung opened 6 years ago

cechung commented 6 years ago

Hi, I know this issue may have barely relevance to this project, but I'm at my wits' end. So I'm here asking for help, any help will be really appreciate.

I'm working on some experiment about "Single Object Tracking", by using LSTM after Mask RCNN. Use feature maps and bbox coordinates from Mask RCNN as input of LSTM, and LSTM will output the bbox of target object. (Only one object will be tracked, so output size=4) Because of "FPN", their will be 4 layers of feature maps(P2-P5), so I modified MRCNN model to return the variable "roi_level", which represents each bbox of MRCNN results correspond to which layer of feature map. When input to LSTM, the input will be bbox of MRCNN results and specific feature map according to "target object's roi_level".

Here is a problem, the feature maps size are too big for LSTM input (for example, P2: 256x256x256), and I have tried using PCA to reduce and fix all feature maps' dimensions to 16x16x64, but ends up that the LSTM cannot learn well. I have also tried that re-train MRCNN with modifying feature map depth to 1 instead of 256, then use "4 LSTM" for each layer of feature maps, generate 4 bboxes, and output the right one according to roi_level. It still didn't work.

My LSTM seems like it cannot learn from those information because I found that as the training iteration increase, the accuracy of training dataset isn't improved.

Is there any idea that I can give it a try?

Superlee506 commented 6 years ago

@EricccChung I don't understand why you choose 4 LSTM. As to you description, if the feature maps are too big for LSTM, you can use the features after ROIAlign.

cechung commented 6 years ago

@Superlee506 You mean that use the 7x7 ROIs to replace the feature maps P2-P5 for LSTM input? But is it work? I'm not sure because the ROIs only contain object's feature and there are not any location information of the whole image.

Or I mistake your comment, please do let me know. Really thanks for your reply.

g2-bernotas commented 6 years ago

Hi, have you (or anyone) had any success incorporating LSTM into the Mask RCNN?

I have time series of the same class objects moving around (e.g. 10 people). I would like to receive a consistent mask label for each of them throughout the images, so I was wondering if anyone has achieved this already.

soulslicer commented 6 years ago

I am curious of this as well

austinegri commented 6 years ago

Following. Theoretically, you can search in and around a found bounding box from a prior frame (assuming you are using video) and send that image to the mask r-cnn. From there repeat and repeat

VellalaVineethKumar commented 5 years ago

@EricccChung I don't understand why you choose 4 LSTM. As to you description, if the feature maps are too big for LSTM, you can use the features after ROIAlign.

Can you help me out as in how to extract the feature maps after ROIAlign? I'm really struggling to do it. please help

VellalaVineethKumar commented 5 years ago

@cechung Hey, even i am trying to experiment the same, how did you get the features from ROI? can you please help in this regard

MadhuKantharaju commented 4 years ago

Did someone achieve this? i am trying to do the same since a long time. Any help appreciated. Thank you :)