matterport / Mask_RCNN

Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow
Other
24.7k stars 11.71k forks source link

Any idea why my mrcnn_class_loss is increasing? #590

Open zungam opened 6 years ago

zungam commented 6 years ago

Hi, everything is going okey, and it looks like my model is getting better, but mrcnn_class_loss is going up all the time and I dont understand why. Any ideas? I only have one class type in my images and in my model (+ background). image Same results in my validation loss graphs.

patrick-llgc commented 6 years ago

What if you increase the weight for class loss? The default is all 1's.

LOSS_WEIGHTS                   {'mrcnn_mask_loss': 1.0, 'rpn_class_loss': 1.0, 'mrcnn_class_loss': 1.0, 'mrcnn_bbox_loss': 1.0, 'rpn_bbox_loss': 1.0}
cklat commented 6 years ago

I have no solution for you but just wanted to ask how your training config is set in terms of e.g. learning rate etc.? Did you leave them on the default settings like in the config class or did you experiment with some of them?

zungam commented 6 years ago

@patrick-12sigma LOSS_WEIGHTS is default @cklat learning rate is default I train on images wih 800 fish steams, so every mask is very crowded. I see that there is some "crowd" handling in "build_rpn_targets"

Handle COCO crowds
    # A crowd box in COCO is a bounding box around several instances. Exclude
    # them from training. A crowd box is given a negative class ID."

Can this affect it?

patrick-llgc commented 6 years ago

You can try changing the loss weights by increasing the mrcnn classification loss. If you look at the scales of the losses you will see that the mrcnn classification loss is almost one order of magnitude smaller than the rest. The optimizer you use can only reduce the total loss, so if you want a certain loss to be optimized you'd better increase its relative impact to the total loss.

You can also try changing the learning rate, but as your total weight is decaying I don't think that may help solving your problem much.

As for crowd, you can inspect some of the predictions (using the inspect model ipynb) and see if you indeed group multiple objects into one mask during inference time.

zungam commented 6 years ago

I only have one class, which is "fish" (and the background), what does mrcnn classification loss mean really? Does it mean how well it seperates fish from background?

patrick-llgc commented 6 years ago

@zungam I believe so. That means after region proposal, the network is having difficulty telling if the proposed bbox is fish or background.

I am curious how do your other parts of the loss look like. In particular the RPN classification loss. Does it also increase? If that also increases, that means the fish bbox and background bbox are hard to be classified.

Also, you are showing the loss for training dataset. How does the loss look like on validation dataset?

zungam commented 6 years ago

No smoothing this time.

image It actually not increasing. Hmm, weird. Also, the validation mrcnn_class is decreasing, while the training mrcnn_class is increasing. Weird. Also, the mrcnn is very unstable. All other graphs looks good still.

Any ideas whats going on here?

patrick-llgc commented 6 years ago

Your RPN seems to be doing quite well. I think your validation loss is behaving well too -- note that both the training and validation mrcnn class loss settle at about 0.2. About the initial increasing phase of training mrcnn class loss, maybe it started from a very good point by chance?

I think your curves are fine. Are you seeing any performance related issue during inference?

If you really want it to decrease, you can of course make the weight for mrcnn class loss really large, say 10 or 100. Then train your network and see if it goes down.

zungam commented 6 years ago

@patrick-12sigma I think your curves are fine. Are you seeing any performance related issue during inference?

image Sometimes it works quite well, sometimes it is super bad. But its overall getting better, so Im just worried based on the graph.

@patrick-12sigma About the initial increasing phase of training mrcnn class loss, maybe it started from a very good point by chance?

Could be that it detects background (random guesses) alot in the start, and just label them background, but as the RPN get better and feeds in 50%/50% background/fish, it gets worse at predicting them.

@patrick-12sigma If you really want it to decrease, you can of course make the weight for mrcnn class loss really large, say 10 or 100.

Do you think it will help and not mess up other parts of the network?

Anyhow, thanks for your help so far.

patrick-llgc commented 6 years ago

You are welcome, and cool application! I can see why your application fails in the example on the left. The edges of the fishes are not clean cut, and the lighting condition does not help. Are you working with still images or a sequence of them (videos, etc)?

Increasing the weight may help, or at least it's worth a shot. In the extreme case, you can set all other weights to zero and force the total loss equals to mask rcnn classification loss. In that case, the optimizer is forced to minimize this loss. In my own application, I had similar issues before, and increasing the mrcnn classification loss weight did help (although the curves still look weird).

zungam commented 6 years ago

@patrick-12sigma Im working on still images. Nice noticing the edges, I did not notice it. I will try to increase the weight and see what happens when its finished training (after early stopping). Thanks for the sharing of experience!

hadim commented 6 years ago

@zungam I am facing the same issue as you are. Did you find a way to decrease all losses during training?

zungam commented 6 years ago

My maskrcnn_mask_loss started to go down after a while, but i trained for 40 days, and it only went down from 0.21 to 0.15. I only had one class, so i believe it had a hard time classifying fish from background in the beginning because of my unusual dataset. I had no problems with other loss plots.

Can you give me a plot of your graphs?

ltrottier commented 6 years ago

I am currently experiencing a similar issue on my dataset.

These plots were made with a ResNet-like backbone.

screenshot from 2018-11-01 11-15-43 screenshot from 2018-11-01 11-15-51 screenshot from 2018-11-01 11-15-59 screenshot from 2018-11-01 11-16-06

I also experimented with a VGG 16 backbone, and the plots were fine. The difference is that with VGG, the RPN was trained first (with loss weights = 0 for the other losses) and then all losses were activated. You could try that for you problem.

As a side note, I looked at the predictions from ResNet and there was a lot of mislabeled objects. With VGG the predictions were very accurate. So the increase in loss does indeed reflect bad accuracy.

PhanDuc commented 5 years ago

You are welcome, and cool application! I can see why your application fails in the example on the left. The edges of the fishes are not clean cut, and the lighting condition does not help. Are you working with still images or a sequence of them (videos, etc)?

Increasing the weight may help, or at least it's worth a shot. In the extreme case, you can set all other weights to zero and force the total loss equals to mask rcnn classification loss. In that case, the optimizer is forced to minimize this loss. In my own application, I had similar issues before, and increasing the mrcnn classification loss weight did help (although the curves still look weird).

@patrick-llgc , could you please explain more about Increasing and Decreasing the weights how to properly increase and decrease that weights?

Vaspra commented 5 years ago

Just to throw my two cents in here, could it be possible that your model isn't necessarily getting worse at identifying bg/fg when actually passed a good subject, it is just starting out being fed garbage (from the initially poorly trained rpn), and it is really easy for it to tell that most of the garbage is - in fact - garbage?

Maybe, as the rest of the model is improving, more actually challenging, correct proposed regions are being tested on, and these are harder to correctly classify.

I don't know if this could be what is going on, and if this sounds wrong by anyone more knowledgeable on this model please correct me I'd like to know this.

CodeXiaoLingYun commented 5 years ago

i have the similar question, but difference is my loss is loss: 0.8768 - rpn_class_loss: 0.0128 - rpn_bbox_loss: 0.2369 - mrcnn_class_loss: 0.0626 - mrcnn_bbox_loss: 0.1969 - mrcnn_mask_loss: 0.3675 - val_loss: 1.0695 - val_rpn_class_loss: 0.0152 - val_rpn_bbox_loss: 0.3043 - val_mrcnn_class_loss: 0.0612 - val_mrcnn_bbox_loss: 0.2802 - val_mrcnn_mask_loss: 0.4086, you can see the rpn_bbox_loss: 0.2369, mrcnn_bbox_loss: 0.1969, mrcnn_mask_loss: 0.3675 very high and not to go down, i have no method.

Ayushkumar15 commented 4 years ago

As I have trained the Mask-rcnn for my own dataset. But unable to draw the plot for any loss. Please anyone share the code for that. Because I have to show them in my project.

darshan-majithiya commented 4 years ago

@Ayushkumar15 all these plots are using tensorboard.. You can check the documentation to launch it on your tfevents file in logs.

kimile599 commented 4 years ago

i have the similar question, but difference is my loss is loss: 0.8768 - rpn_class_loss: 0.0128 - rpn_bbox_loss: 0.2369 - mrcnn_class_loss: 0.0626 - mrcnn_bbox_loss: 0.1969 - mrcnn_mask_loss: 0.3675 - val_loss: 1.0695 - val_rpn_class_loss: 0.0152 - val_rpn_bbox_loss: 0.3043 - val_mrcnn_class_loss: 0.0612 - val_mrcnn_bbox_loss: 0.2802 - val_mrcnn_mask_loss: 0.4086, you can see the rpn_bbox_loss: 0.2369, mrcnn_bbox_loss: 0.1969, mrcnn_mask_loss: 0.3675 very high and not to go down, i have no method.

Have you solved your problem?