maggiez0138 / Swin-Transformer-TensorRT

This project aims to explore the deployment of Swin-Transformer based on TensorRT, including the test results of FP16 and INT8.
MIT License
161 stars 29 forks source link

Hello, I have not reproduced the income of the batch you mentioned! #10

Closed tensorflowt closed 1 year ago

tensorflowt commented 2 years ago

Hello, I have not reproduced the income of the batch you mentioned. My test machine environment is as follows: db1ea281927748d6fb724ac78d7a594 The model I tested is as follows: 58ccc5b3f07202001acbfe88b6930ef My evaluation script is as follows: trtexec --loadEngine=./weights/swin_tiny_patch4_window7_224_batch16.engine My test results are as follows: 2e8c5b16812c75a2076ed3fedc8564f The specific test screenshots are as follows: 5434dfd2187e8f4bf3f17004147208a 4514d5e49a9f3332dd3462184324ea9 0f1954fe7c53e3349a2037b923141ce

Because, I'm more curious, why are my results so different from yours? same graphics hardware.

Looking forward to your reply and good luck.

maggiez0138 commented 1 year ago

Hello, I have not reproduced the income of the batch you mentioned. My test machine environment is as follows: db1ea281927748d6fb724ac78d7a594 The model I tested is as follows: 58ccc5b3f07202001acbfe88b6930ef My evaluation script is as follows: trtexec --loadEngine=./weights/swin_tiny_patch4_window7_224_batch16.engine My test results are as follows: 2e8c5b16812c75a2076ed3fedc8564f The specific test screenshots are as follows: 5434dfd2187e8f4bf3f17004147208a 4514d5e49a9f3332dd3462184324ea9 0f1954fe7c53e3349a2037b923141ce

Because, I'm more curious, why are my results so different from yours? same graphics hardware.

Looking forward to your reply and good luck.

Actually, the throughput should be batchsize*(throughput of trtexec output). So the performance is as expected. But my perforce is a little better than you, maybe it is because the GPU workload(GPU is busy with other tasks), or GPU frequency, it is really complexed. Overrall, with bigger batchsize(from 1 to 32), the throughput should increase.