matterport / Mask_RCNN

Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow
24.52k stars 11.68k forks source link

ResNeXt backbone #215

Open canerozer opened 6 years ago

canerozer commented 6 years ago

Has anyone implemented the ResNeXt backboned version of Mask R-CNN and tested the results?

sbugallo commented 6 years ago

I have been working on it for a few days but I keep getting NaN and zero valued losses. I might have some bug in my ResNeXt implementation.

canerozer commented 6 years ago

I am currently trying transferring the weights of FAIR version of Mask R-CNN, maybe if I become successful in that, I may provide you the pretrained version of these weights.

model file for ResNet 101 here:

ericj974 commented 6 years ago

Implementation done on my side (ResNext-50 cardinal = 32m ResNext-101 cardinal = 32). Testing it. Will update you if it works

sbugallo commented 6 years ago

@ericj974 any updates? I sucessfully trained the model using a VGG16 and VGG19 backbone

ericj974 commented 6 years ago

see and implementation of resnet_graph

zspasztori commented 6 years ago

@BugaDM what was the accuracy, training and inference times of VGG16 and VGG19 compared to original?

sbugallo commented 6 years ago

@valikund I just did a small training on my own dataset to check if the backbone was working. I cannot give you any reliable metric, but VGG16 and VGG19 results were worse in terms of accuracy, mAP... The training time was better with both VGGs than with ResNet101. Inference time was the same (around 0.2 seconds with images of size 1024x1024).

I'm trying to implement some more backbones and carry out some full trainings with all of them. I'll update you when I finish.

chenyuZha commented 6 years ago

@ericj974 I tested your model with coco_weights and image net weights, however I got the errors: ValueError: Layer #9 (named "res2a_branch2b") expects 0 weight(s), but the saved weights have 2 element(s). Could you tell me how you configure when you test? Thanks a lot

John1231983 commented 6 years ago

@ericj974 : Thanks for your code. Could you tell me the performance of ResnetXt in comparison with Resnet 101 and Resnet 50?

23pointsNorth commented 6 years ago

@BugaDM , what about smaller backbones like MobilNet? That should have faster inference, at the cost of acc/map.

paulcx commented 6 years ago

@ericj974 How did you deal with the weights? I got similar errors like @chenyuZha got. Any progress?

ericj974 commented 6 years ago

@paulcx @chenyuZha sorry for my late reply. Actually you cannot load the h5 file (mask_rcnn_coco.h5) since it assumes a standard resnet and not resnext as backend encoder. You will have to do the training from scratch.

John1231983 commented 6 years ago

@ericj974 : What is your performance of resnetxt ? How do you train from scratch? I tried the command python3 train --dataset=/path/to/coco/ (deleted --model=coco) but it got error

    if args.model.lower() == "last":
AttributeError: 'NoneType' object has no attribute 'lower'
paulcx commented 6 years ago

@ericj974 Have you tried the other backbone models like inception resnet v2?

Cpruce commented 6 years ago

Haha I actually did my first run with the mobilenet backbone right before @23pointsNorth had mentioned it. I can confirm that it works from scratch and imagenet pretrained mobilenet weights, though still trying to get the quality of predictions on par with the resnet50 backbone

gsujan commented 6 years ago

Hi @Cpruce I am trying to use MobileNET backbone as well. It includes a layer called Depthwise Separable Convolution. Keras offers seperable2dconv layer. Did you just use the seperable2dconv layer or add anything else while modelling the backbone ?

Cpruce commented 6 years ago

Hi @gsujansai

I saw the seperable2dconv layer in Keras but didn't try it out since I pulled the one from the keras models. This one creates its own _depthwise_conv_block which contains the pointwise convolutions at the end of each.

@waleedka I have a memory leak but besides that my new backbone works. Shall I open a pull request?

gsujan commented 6 years ago

@Cpruce Thank you

Cpruce commented 6 years ago

For everyone interested, check out my pull request

Let me know if you have any improvements and feel free to contribute!

chenyuZha commented 6 years ago

@Cpruce Hello I saw your implementation of maskRcnn with mobilenet224 backbone, very interesting! Do you think it's possible to integrate the model into mobile devices for inference ? (like mobilenet always do)..

Cpruce commented 6 years ago

@chenyuZha :D yes definitely! I have (somewhat) gotten results on my mobile phone, though it is still pretty slow. You will face a few obstacles to overcome but there is much to do after getting the model to load. Let me know if you start going down this road :)

JonathanCMitchell commented 6 years ago

@Cpruce I actually implemented a very similar MobileNet 224 backbone, it appears to be the same as the one I saw on your repository. Did you try training it on lower resolution images, perhaps (224, 224, 3). If you train it on lower res, and you run inference on lower res it will drastically increase the speed. Although I am still getting NaN errors in rpn_bbox_loss when trying to do so.

Cpruce commented 6 years ago

@JonathanCMitchell Cool! nope I haven't tried lower resolution yet. I've still got a few tricks I want to try before, since the average precision and recall still aren't as good. Can you show evaluation results and images with the instance segmentation?

JonathanCMitchell commented 6 years ago

I am still trying to get rid of NaN errors on rpn_bbox_loss when lowering the image dimensions. I have a thread here 321

Cpruce commented 6 years ago

@JonathanCMitchell moving our conversation to your thread

chenyuZha commented 6 years ago

@Cpruce When you test with your phone, it's with Android studio or Xcode that you integrate your *.pb ?( I guess that you have to convert .h5 to .pb). I have tried to integrate a model(CNN) trained with inception v3 to Android studio and it worked well , but when I tried to use my own custom model pb, it didn't work any more.. I guess that I should modify something in Android Studio? Like input tensor or output tensor.. If you have any idea that will be very very appreciate !!f

Cpruce commented 6 years ago

Yup! Could you post the summary of your model? You'll get a message telling you all the arguments to use. However, note that the input_layer_shape that it tells you may be wrong if you have multiple inputs

chenyuZha commented 6 years ago

@Cpruce Sorry for reply so late.. In fact I have tested with TF DETECT and TF CLASSIFIER, which finally worked( I made a stupid error before..). But the fact is , for the TF DETECT, I could only use the pre-trained model like mobilenetV1(because il sounds very tricky to modify the input or output node in java script..). I really want to test something new (like Deeplab V3 +mobilenet V2). But since then I haven't found any information of implementation of the script.. As you actually work in mobilenet of MaskRcnn, have you already tested in android studio with this model? Maybe you have some more experience in Android studio..

chenyuZha commented 6 years ago

@ericj974 I have a problem of the resneXt network. In fact I have already trained a model with resneXt and convert .h5 to .pb successfully.(with your file . But when I tried to do the inference with this .pb, I got error of tf.py_func: UnknownError (see above for traceback): KeyError: 'pyfunc_0' [[Node: mrcnn_detection/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], Tout=[DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/device:CPU:0"](ROI/packed_2/_67, mrcnn_class/Reshape_1/_69, mrcnn_bbox/Reshape/_71, _arg_input_image_meta_0_1)]] . It seems that when we freeze the graph, the tf.py_func will cause the problem with certain operations. Could you telle me if you have the same issue? How do you solve it ? Thanks for your reply!

Cpruce commented 6 years ago

@chenyuZha Yes, I have loaded the model on my phone/android studio and have gotten it to run inference. However, predictions are still too slow and I need to work on how I feed/preprocess the image. For your 2nd problem, could you try with the latest master? The last instance of py_func was removed in my first pull request

chenyuZha commented 6 years ago

@Cpruce Yes I updated the script and now I can run the inference !! Thanks for your help! For the part of mobilenet MaskRCNN, have you tried to reduce the number of boundingboxes of second stage? MayBe il will speed up the running process.

Cpruce commented 6 years ago

@chenyuZha Awesome 😄 Nope I haven't tried reducing the number of proposals/detections but I'm sure it will speed up inference. How much is another story though. Have you tested my fork by any chance? You can also test your hypothesis via the master repo

chenyuZha commented 6 years ago

@Cpruce Yes I'll test your model, by the way for the part of Android studio, I should do some modifications so that I could load the masks I guess?

Cpruce commented 6 years ago

@chenyuZha cool let me know how it goes and yes, there should be extra pre and post processing steps for Android. Are you using tensorflow mobile?

chenyuZha commented 6 years ago

@Cpruce yes I did all of installation from this tutorial : Then for my case I modified the graph of demo app TF Detect

chenyuZha commented 6 years ago

@ericj974 Have you tested resneXt101 cardinality=32? I've downloaded your code 2 months ago (the file is not available in your git repository now)... But as I've already downloaded, when I re-check the function resnet_graph , I found that you didn't change the num of filters in each stage(which remains the same as the resnet50). For example: ``

Stage 2

x = conv_block(x, 3, [64, 64,256], stage=2, block='a', strides=(1, 1), cardinality = cardinality)
x = identity_block(x, 3, [64, 64,256] stage=2, block='b', cardinality = cardinality)
C2 = x = identity_block(x, 3, [64,64, 256], stage=2, block='c', cardinality = cardinality)


But in the paper, it's something like this:


Stage 2

x = conv_block(x, 3, [128, 128,256], stage=2, block='a', strides=(1, 1), cardinality = cardinality)
x = identity_block(x, 3, [128, 128,256] stage=2, block='b', cardinality = cardinality)
C2 = x = identity_block(x, 3, [128,128, 256], stage=2, block='c', cardinality = cardinality)


mjjackey commented 5 years ago

@paulcx @chenyuZha sorry for my late reply. Actually you cannot load the h5 file (mask_rcnn_coco.h5) since it assumes a standard resnet and not resnext as backend encoder. You will have to do the training from scratch.

please how to do the training from scratch?

Altimis commented 4 years ago

Hi, did someone succeed at implementing ResNext 101 backbone in Mask Rcnn Matterport implementation please ? I think that with this backbone we could have better results than resnet 101. I am "newbie" in the computer vision world and i dont have the strengh to implement a backbone from scratch.

saifulNslOfficial commented 3 years ago

Hello @ericj974 could you please share your Backbond ResNet to Resnext changing implementation? and I want to know how improved performance after changing this Backbone.

carolinarutililima commented 2 years ago

@valikund I just did a small training on my own dataset to check if the backbone was working. I cannot give you any reliable metric, but VGG16 and VGG19 results were worse in terms of accuracy, mAP... The training time was better with both VGGs than with ResNet101. Inference time was the same (around 0.2 seconds with images of size 1024x1024).

I'm trying to implement some more backbones and carry out some full trainings with all of them. I'll update you when I finish.


Have you uploaded these backbones to your Github?

I've added you to your Linkedin, I would appreciate it if you accept the invitation.


carolinarutililima commented 2 years ago

Hello @ericj974 could you please share your Backbond ResNet to Resnext changing implementation? and I want to know how improved performance after changing this Backbone.

Have you done that?