tensorflow / models

Models and examples built with TensorFlow
Other
77.03k stars 45.78k forks source link

How to do batch inference with maskrcnn? #9951

Open QuarTerll opened 3 years ago

QuarTerll commented 3 years ago

Prerequisites

Please answer the following question for yourself before submitting an issue.

1. The entire URL of the file you are using

I use the version of tf 1.x for maskrcnn.

2. Describe the feature you request

I understand there is a way for single image inference.

I want to know how to do the batch inference in one tensor dict rather than using for loop to do it with one image one time.

3. Additional context

In object_detection.utils.ops, reframe_box_masks_to_image_masks function only process one image. In issues(#8812), I saw Detection part, run_inference_for_single_image function.

why this? why cannot maskrcnn process muti images at the same time just like other networks?

image

4. Are you willing to contribute it? (Yes or No)

Yes

QuarTerll commented 3 years ago

@mburakbozbey Hello, I saw your reply on issues(#8812), any help for this one? Thank you!

QuarTerll commented 3 years ago

I am trying to compare the performance of FPS between inference of single image and batch images.

As for now, I can see the function reframe_box_masks_to_image_masks which only accepted single image.

How to make this function works for batch images? If this can be implemented, I think that the batch inference for maskrcnn maybe works.

mrinal18 commented 3 years ago

The model takes a list of images because it's strictly more general than passing a batch, as it support images of different sizes. Internally, all images are padded to be the same size and are batched i think

You can also refer to this SO link for reference.

QuarTerll commented 3 years ago

The model takes a list of images because it's strictly more general than passing a batch, as it supports images of different sizes. Internally, all images are padded to be the same size and are batched I think

You can also refer to this SO link for reference.

@Mrinal18 Thanks for your participation and time!

I had read the answer above before. But it shows that do inference with one picture at a time using "the for loop" rather than using tensors for images in the feed_dict, such as "BNWH". As for now, parameter B is 1 in the code on the above link.

From my perspective, I think that there is no much performance optimization using "the for loop". Contrarily, if it can be implemented with batch inference using tensors for images in the feed_dict, it can improve the inference speed because of the Cuda computation optimization or something.

In other words, I want to use batch in the inference process just like batch in the training process.

I think it is feasible. However, in the code of the Tensorflow object detection model zoo, just pick the first batch for the first image. For this reason, it can only detect one image at a time.

image

The picture above is on the website you provided. We can see that detection_boxes, mask, and the other outputs are squeezed then left just the first batch which is the first image. If we can change this part without the squeeze process and left all batch, the batch inference for maskrcnn maybe works.

But I don't know how to do that for the mask part which I mentioned in the last reply.

I am trying to compare the performance of FPS between inference of single image and batch images.

As for now, I can see the function reframe_box_masks_to_image_masks which only accepted single image.

How to make this function works for batch images? If this can be implemented, I think that the batch inference for maskrcnn maybe works.

QuarTerll commented 3 years ago

@tombstone @jch1 @pkulzc Hello, any help with this one? Thanks!