jkjung-avt / tensorrt_demos

TensorRT MODNet, YOLOv4, YOLOv3, SSD, MTCNN, and GoogLeNet

https://jkjung-avt.github.io/

MIT License

1.74k stars 545 forks source link

The mtcnn consumes takes alot of RAM. #573

Closed aafaqin closed 2 years ago

aafaqin commented 2 years ago

2.35GB ram is being consumed. Any way to reduce it? Any reason why it is this much?

jkjung-avt commented 2 years ago

The TensorRT PNet engine takes the most memory. Please read my Optimizing TensorRT MTCNN blog post, as well as the comments in mtcnn/det1_relu.prototxt. Make sure you understand the design.

On way to reduce memory consumption is to trade off how large an input image you'd like to process is (or how small the faces you'd like to detect in the images are), versus memory consumption (and inference speed) of the TensorRT MTCNN code.

My design is for 1280x720 input images. If, for example, you reduce that to 640x360, memory consumption could be reduced to roughly 1/4. (But then only faces at least twice as large could be detected.)

aafaqin commented 2 years ago

How do i do the dimension calculation?

# Max allowed input image size as: 1280x720
   # 'minsize' = 40
   #
   # Input dimension of the 1st 'scale':  
   #    720 * 12 / 40 = 216
   #   1280 * 12 / 40 = 384
   #                72,128
   # H's in all scales: (scale factor = 0.709)
   #   Original: 216.0, 153.1, 108.6 77.0, 54.6, 38.7, 27.4, 19.5, 13.8, (9.8)
   #   Rounded:  216, 154, 108,  78,  54,  38,  28,  20,  14
   #   Offsets:    0, 216, 370, 478, 556, 610, 648, 676, 696, (710)
   #
   # Input dimension of the 'stacked image': 710x384
   #
   # Output dimension: (stride=2)
   #   (710 - 12) / 2 + 1 = 350
   #   (384 - 12) / 2 + 1 = 187
   #

Am unable to understand what the above calculation mean. how did 710 come from 720 What would my Offset be for 640X360?

aafaqin commented 2 years ago

I tried few values but i am getting dimension error. I was able to speed up(66FPS-> 78FPS on RTX 3090) the inference with reducing max batches in RNET and ONET. But no improvement in reducing the Memory consumption.

jkjung-avt commented 2 years ago

Am unable to understand what the above calculation mean.

The calculation corresponds to stacking different scales of the original image from top to bottom. The "Offsets" are the y-axis values of (top-left corner of) all those scaled images.

aafaqin commented 2 years ago

Okay I tried to replicate the same `

utils.mtcnn

input_h_offsets = (0, 185, 205, 279, 307, 327, 341, 351, 358) output_h_offsets = (0, 92, 102, 139 , 153, 163, 170, 175, 179 ) The prototxt input_param{shape:{dim:1 dim:3 dim:358 dim:108}} ` Its running now at 120 FPS but memory consumption not reduced also the output is not coming. No detection happening no face detected in simple images of my test set.

jkjung-avt commented 2 years ago

The PNet was designed with stride=2. So you need to round "h" & "w" numbers to even numbers.
In addition to modifying the input dimension of the TensorRT PNet engine, you'd also need to modify "utils/mtcnn.py" source code accordingly.
- Input dimension of TensorRT PNet engine (for stacked multi-scale input image): https://github.com/jkjung-avt/tensorrt_demos/blob/9c5e68244b611f886e8906a811066cf3b548346e/mtcnn/det1_relu.prototxt#L26
- The corresponding "utils/mtcnn.py" source code: https://github.com/jkjung-avt/tensorrt_demos/blob/9c5e68244b611f886e8906a811066cf3b548346e/utils/mtcnn.py#L228-L230

aafaqin commented 2 years ago

Ok thanks actually i did modify utils.mtcnn but not to nearest even number. Will do that then update you. Also I thing TRT inherintly takes 2.35 GB RAM (YOLOv4 and mtcnn from your repository I tried suggest the same). Any way to reduce that.??

jkjung-avt commented 2 years ago

Also I thing TRT inherently takes 2.35 GB RAM (YOLOv4 and mtcnn from your repository I tried suggest the same). Any way to reduce that?

Sorry, I don't have an idea.