Hello, I have not reproduced the income of the batch you mentioned！

maggiez0138 / Swin-Transformer-TensorRT

This project aims to explore the deployment of Swin-Transformer based on TensorRT, including the test results of FP16 and INT8.

MIT License

161 stars 29 forks source link

Hello, I have not reproduced the income of the batch you mentioned. My test machine environment is as follows： The model I tested is as follows: My evaluation script is as follows： trtexec --loadEngine=./weights/swin_tiny_patch4_window7_224_batch16.engine My test results are as follows: The specific test screenshots are as follows:

Because, I'm more curious, why are my results so different from yours? same graphics hardware.

Looking forward to your reply and good luck.

Actually, the throughput should be batchsize*(throughput of trtexec output). So the performance is as expected. But my perforce is a little better than you, maybe it is because the GPU workload(GPU is busy with other tasks), or GPU frequency, it is really complexed. Overrall, with bigger batchsize(from 1 to 32), the throughput should increase.

maggiez0138 / Swin-Transformer-TensorRT

Hello, I have not reproduced the income of the batch you mentioned！ #10