Closed philipp-schmidt closed 2 years ago
This works fine for batch size 8 and the full yolov4 (e.g. from the crowdhuman training repo), but I get crashes for yolov4-tiny-3l:
[TensorRT] VERBOSE: --------------- Timing Runner: 003_convolutional_lrelu copy (Reformat)
[TensorRT] INTERNAL ERROR: Assertion failed: validateInputsCutensor(src, dst)
../rtSafe/cuda/cutensorReformat.cpp:227
Aborting...
[TensorRT] VERBOSE: Builder timing cache: created 206 entries, 45 hit(s)
[TensorRT] ERROR: ../rtSafe/cuda/cutensorReformat.cpp (227) - Assertion Error in executeCutensor: 0 (validateInputsCutensor(src, dst))
ERROR: failed to build the TensorRT engine!
I did not get this error with the same engine and the "old" plugin.
@jkjung-avt Can you check my implementation?
TensorRT is already optimizing a lot of layers (can see it in verbose log), but crashes near the end.
Successfully tested with yolov4-crowdhuman-608x608 with MAX_BATCH_SIZE 8, OPT_BATCH_SIZE 4, MIN_BATCH_SIZE 1 No success with yolov4-tiny-3l-crowdhuman-416x416 with MAX_BATCH_SIZE 8, OPT_BATCH_SIZE 4, MIN_BATCH_SIZE 1
@philipp-schmidt Sorry, I'm really busy at work lately. I don't have time to review the code. And I'm not sure I should handle this pull request. (I'm not going to merge this into my master branch if there's an known issue...)
We'll work on this and open the PR again when it's done.
Changes:
457