jkjung-avt / tensorrt_demos

TensorRT MODNet, YOLOv4, YOLOv3, SSD, MTCNN, and GoogLeNet
https://jkjung-avt.github.io/
MIT License
1.74k stars 545 forks source link

Draft: Implement dynamic batch size in yolo plugin #465

Closed philipp-schmidt closed 2 years ago

philipp-schmidt commented 3 years ago

Changes:

457

philipp-schmidt commented 3 years ago

This works fine for batch size 8 and the full yolov4 (e.g. from the crowdhuman training repo), but I get crashes for yolov4-tiny-3l:

[TensorRT] VERBOSE: --------------- Timing Runner: 003_convolutional_lrelu copy (Reformat)
[TensorRT] INTERNAL ERROR: Assertion failed: validateInputsCutensor(src, dst)
../rtSafe/cuda/cutensorReformat.cpp:227
Aborting...
[TensorRT] VERBOSE: Builder timing cache: created 206 entries, 45 hit(s)
[TensorRT] ERROR: ../rtSafe/cuda/cutensorReformat.cpp (227) - Assertion Error in executeCutensor: 0 (validateInputsCutensor(src, dst))
ERROR: failed to build the TensorRT engine!

I did not get this error with the same engine and the "old" plugin.

@jkjung-avt Can you check my implementation?

philipp-schmidt commented 3 years ago

TensorRT is already optimizing a lot of layers (can see it in verbose log), but crashes near the end.

philipp-schmidt commented 3 years ago

Successfully tested with yolov4-crowdhuman-608x608 with MAX_BATCH_SIZE 8, OPT_BATCH_SIZE 4, MIN_BATCH_SIZE 1 No success with yolov4-tiny-3l-crowdhuman-416x416 with MAX_BATCH_SIZE 8, OPT_BATCH_SIZE 4, MIN_BATCH_SIZE 1

jkjung-avt commented 3 years ago

@philipp-schmidt Sorry, I'm really busy at work lately. I don't have time to review the code. And I'm not sure I should handle this pull request. (I'm not going to merge this into my master branch if there's an known issue...)

philipp-schmidt commented 2 years ago

We'll work on this and open the PR again when it's done.