iwatake2222 / play_with_tensorrt

Sample projects for TensorRT in C++
Apache License 2.0
187 stars 34 forks source link

Int8 Calibration trying to read non existing file #26

Closed ghost closed 1 year ago

ghost commented 2 years ago

Environment (Hardware)

Cuda 11.3 Cudnn 8.4 TensorRT 8.4

Project Name

Int8 calibration

Issue Details

When i set all the important parameters for Calibration like following:

#define CAL_DIR        "ppmSamples"
#define CAL_LIST_FILE  "list.txt"
#define CAL_BATCH_SIZE 10
#define CAL_NB_BATCHES 2430
#define CAL_IMAGE_C    3
#define CAL_IMAGE_H    256
#define CAL_IMAGE_W    256
/* 0 ~ 1.0 */
// #define CAL_SCALE      (1.0 / 255.0)
// #define CAL_BIAS       (0.0)
/* -2.25 ~ 2.25 */
#define CAL_SCALE      (1.0 / (255.0 * 0.225))
#define CAL_BIAS       (0.45 / 0.225)

and use all the preparation steps for the calibration it doesnt work. First of all i used 2430 images for calibration which i put into the directly ppmSamples and the list is also filled with the image names without the file extension. When compiling the programm i get following console output and i cant figure out what im doing wrong or if there maybe is a bug. All images have the width and height of 256 like in the defines and i checked a few times already if the list.txt has anything weird in it and checked with a script if any file or list item is missing but nothing. Also tried out alot different batch sizes but nothing works

The output is: Could not find 0000005 .ppm in data directories: ppmSamples and the program is right there is no 0000005 .ppm file because i named them from 0-2430ppm and inside the list.txt is also no 0000005 which gives me the idea that my naming approach is wrong in general. Thought your images are just randomly named inside the samples folder. Is there any naming routine i can follow in order to get it to work? My file naming approach is already snake case because there are no capital letters or spaces just numbers

Another update: I just tried naming my files exactly like you do it by adding the 0's so i made a script to always get to 12 character for the filename and i still count to 2430 (my total image count) but before the number im adding 0's to always get to a filename length of 12 chars. so my first image name is: 000000000000.ppm and my last one is 000000002430.ppm The programm still tries to find files that arent even specified inside the list.txt or in the ppmSample folder i defined

Error Log

[InferenceHelper][119] Use TensorRT
[08/27/2022-20:56:36] [I] [TRT] [MemUsageChange] Init CUDA: CPU +188, GPU +0, now: CPU 12047, GPU 1226 (MiB)
[08/27/2022-20:56:36] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 12091, GPU 1226 (MiB)
[08/27/2022-20:56:37] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +317, GPU +104, now: CPU 12557, GPU 1330 (MiB)
[08/27/2022-20:56:37] [I] [TRT] ----------------------------------------------------------------
[08/27/2022-20:56:37] [I] [TRT] Input filename:   resource/model/yolov7_256x256.onnx
[08/27/2022-20:56:37] [I] [TRT] ONNX IR version:  0.0.6
[08/27/2022-20:56:37] [I] [TRT] Opset version:    11
[08/27/2022-20:56:37] [I] [TRT] Producer name:    pytorch
[08/27/2022-20:56:37] [I] [TRT] Producer version: 1.11.0
[08/27/2022-20:56:37] [I] [TRT] Domain:
[08/27/2022-20:56:37] [I] [TRT] Model version:    0
[08/27/2022-20:56:37] [I] [TRT] Doc string:
[08/27/2022-20:56:37] [I] [TRT] ----------------------------------------------------------------
[08/27/2022-20:56:37] [W] [TRT] onnx2trt_utils.cpp:369: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[08/27/2022-20:56:38] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +808, GPU +314, now: CPU 13421, GPU 1652 (MiB)
[08/27/2022-20:56:38] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +143, GPU +58, now: CPU 13564, GPU 1710 (MiB)
[08/27/2022-20:56:38] [I] [TRT] Timing cache disabled. Turning it on will improve builder speed.
[08/27/2022-20:56:40] [I] [TRT] Detected 1 inputs and 4 output network tensors.
[08/27/2022-20:56:40] [I] [TRT] Total Host Persistent Memory: 26592
[08/27/2022-20:56:40] [I] [TRT] Total Device Persistent Memory: 0
[08/27/2022-20:56:40] [I] [TRT] Total Scratch Memory: 0
[08/27/2022-20:56:40] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 140 MiB
[08/27/2022-20:56:40] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 132.835ms to assign 10 blocks to 336 nodes requiring 29229056 bytes.
[08/27/2022-20:56:40] [I] [TRT] Total Activation Memory: 29229056
[08/27/2022-20:56:40] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 14851, GPU 2392 (MiB)
[08/27/2022-20:56:40] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 14851, GPU 2376 (MiB)
[08/27/2022-20:56:40] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +27, now: CPU 0, GPU 167 (MiB)
[08/27/2022-20:56:40] [I] [TRT] Starting Calibration.
[08/27/2022-20:56:40] [I] Batch #0
[08/27/2022-20:56:40] [I] Calibrating with file 000000000000.ppm
[08/27/2022-20:56:40] [I] Calibrating with file 000000000001.ppm
[08/27/2022-20:56:40] [I] Calibrating with file 000000000002.ppm
[08/27/2022-20:56:40] [I] Calibrating with file 000000000003.ppm
[08/27/2022-20:56:40] [I] Calibrating with file 000000000004.ppm
[08/27/2022-20:56:40] [I] Calibrating with file 000000000005.ppm
[08/27/2022-20:56:40] [I] Calibrating with file 000000000006.ppm
[08/27/2022-20:56:40] [I] Calibrating with file 000000000007.ppm
[08/27/2022-20:56:40] [I] Calibrating with file 000000000008.ppm
[08/27/2022-20:56:40] [I] Calibrating with file 000000000009.ppm
[08/27/2022-20:56:40] [I] [TRT]   Calibrated batch 0 in 0.115671 seconds.
[08/27/2022-20:56:40] [I] Batch #1
[08/27/2022-20:56:40] [I] Calibrating with file 0000005.ppm
[08/27/2022-20:56:40] [I] Calibrating with file 000000000006.ppm
[08/27/2022-20:56:40] [I] Calibrating with file 000000000007.ppm
[08/27/2022-20:56:40] [I] Calibrating with file 000000000008.ppm
[08/27/2022-20:56:40] [I] Calibrating with file 000000000009.ppm
[08/27/2022-20:56:40] [I] Calibrating with file 000000000010.ppm
[08/27/2022-20:56:40] [I] Calibrating with file 000000000011.ppm
[08/27/2022-20:56:40] [I] Calibrating with file 000000000012.ppm
[08/27/2022-20:56:40] [I] Calibrating with file 000000000013.ppm
[08/27/2022-20:56:40] [I] Calibrating with file 000000000014.ppm
Could not find 0000005.ppm in data directories:
        ppmSamples
&&&& FAILED
[08/27/2022-20:56:40] [E] [TRT] 1: [virtualMemoryBuffer.cpp::nvinfer1::StdVirtualMemoryBufferImpl::~StdVirtualMemoryBufferImpl::104] Error Code 1: Cuda Runtime (driver shutting down)
ghost commented 2 years ago

Sorry for taking up time by a maybe simple solution for my problem (or not). Idk how much time you have but maybe you would consider a paid helping. Like 50euro for an hour looking if you could fix my problem. Or just throwing ideas for a solution because ideally i would like to know myself what to do

iwatake2222 commented 2 years ago

First, I recommend trying pj_tensorrt_cls_mobilenet_v2 with USE_INT8_WITH_CALIBRATION enabled. So that you can see working INT8 calibration. Then, you can find out differences between pj_tensorrt_cls_mobilenet_v2 and your code/settings.

iwatake2222 commented 1 year ago

This issue was closed because it has been inactive.