huawei-noah / VanillaNet

MIT License
814 stars 57 forks source link

the speed of TRT #22

Open JudasDie opened 1 year ago

JudasDie commented 1 year ago

Hi, thanks for the great work. I have tried to apply vanilla-9 to object detection, however, when transferring the model to TensorRT, it seems much slower than ResNet-34. Is there any guidance? Thanks in advance.

ggjy commented 1 year ago

In object detection, the image input size is much larger than 224 on ImageNet, you can try FP16 or lower down the input resolution, or choose some platform more suitable for our vanilla (e.g., A100) to narrow the gap between vanilla-9 and Res34.

rememberBr commented 6 months ago

Hello, I've been testing the speed of VanillaNet recently. I tried VanillaNet6 with an image size of 128128, which is smaller than 224224. Using tensorrt for inference, the speed of mobilenetv3-large is 0.028ms, while VanillaNet6 is 0.096ms. The gpu is A40. This is inconsistent with the results of directly using torch in the paper. It seems that VanillaNet is not good at fast processing in TRT?

HantingChen commented 6 months ago

Hello, I've been testing the speed of VanillaNet recently. I tried VanillaNet6 with an image size of 128_128, which is smaller than 224_224. Using tensorrt for inference, the speed of mobilenetv3-large is 0.028ms, while VanillaNet6 is 0.096ms. The gpu is A40. This is inconsistent with the results of directly using torch in the paper. It seems that VanillaNet is not good at fast processing in TRT?

Thank you for sharing your observations regarding the speed performance of VanillaNet when tested with TensorRT. It's important to note that changing the input resolution of the model can lead to variations in speed performance. We recommend conducting tests using an input image size of 224*224, as per the model's original design and the conditions under which it was benchmarked in our paper.

If you're using an input size of 128*128, the model might require a redesign to optimize its performance for that specific resolution. The discrepancy you observed, especially when comparing VanillaNet6 with MobileNetV3-Large under TensorRT, might be attributed to this change in input size, which can significantly affect how the model processes data and, consequently, its inference speed.

rememberBr commented 6 months ago

Hello, I've been testing the speed of VanillaNet recently. I tried VanillaNet6 with an image size of 128_128, which is smaller than 224_224. Using tensorrt for inference, the speed of mobilenetv3-large is 0.028ms, while VanillaNet6 is 0.096ms. The gpu is A40. This is inconsistent with the results of directly using torch in the paper. It seems that VanillaNet is not good at fast processing in TRT?

Thank you for sharing your observations regarding the speed performance of VanillaNet when tested with TensorRT. It's important to note that changing the input resolution of the model can lead to variations in speed performance. We recommend conducting tests using an input image size of 224*224, as per the model's original design and the conditions under which it was benchmarked in our paper.

If you're using an input size of 128*128, the model might require a redesign to optimize its performance for that specific resolution. The discrepancy you observed, especially when comparing VanillaNet6 with MobileNetV3-Large under TensorRT, might be attributed to this change in input size, which can significantly affect how the model processes data and, consequently, its inference speed. Hello, thanks for your reply. I tried using a model with size 224 and the speed matched the readme mentioned when setting the batch size to 64 - about 0.3ms (but Mobilenetv3-large only takes 0.1ms, emmmm). It seems that Mobilenet is faster when using TensorRT. I ran test_latency.py again for testing and found the conclusion consistent with the paper. However, when setting batch_size to 64 in the line of code "data_loader_val = create_loader(dataset_val, input_size=size, batch_size=1, is_training=False, use_prefetcher=False)" in test_latency.py, something unexpected happened: VanillaNet6 takes 0.8ms while Mobilenetv3-large only takes 0.25ms. Is my testing method wrong? This conclusion makes me frustrated, you can also try it and see if my conclusion is wrong.

HantingChen commented 6 months ago

Hello, I've been testing the speed of VanillaNet recently. I tried VanillaNet6 with an image size of 128_128, which is smaller than 224_224. Using tensorrt for inference, the speed of mobilenetv3-large is 0.028ms, while VanillaNet6 is 0.096ms. The gpu is A40. This is inconsistent with the results of directly using torch in the paper. It seems that VanillaNet is not good at fast processing in TRT?

Thank you for sharing your observations regarding the speed performance of VanillaNet when tested with TensorRT. It's important to note that changing the input resolution of the model can lead to variations in speed performance. We recommend conducting tests using an input image size of 224224, as per the model's original design and the conditions under which it was benchmarked in our paper. If you're using an input size of 128128, the model might require a redesign to optimize its performance for that specific resolution. The discrepancy you observed, especially when comparing VanillaNet6 with MobileNetV3-Large under TensorRT, might be attributed to this change in input size, which can significantly affect how the model processes data and, consequently, its inference speed. Hello, thanks for your reply. I tried using a model with size 224 and the speed matched the readme mentioned when setting the batch size to 64 - about 0.3ms (but Mobilenetv3-large only takes 0.1ms, emmmm). It seems that Mobilenet is faster when using TensorRT. I ran test_latency.py again for testing and found the conclusion consistent with the paper. However, when setting batch_size to 64 in the line of code "data_loader_val = create_loader(dataset_val, input_size=size, batch_size=1, is_training=False, use_prefetcher=False)" in test_latency.py, something unexpected happened: VanillaNet6 takes 0.8ms while Mobilenetv3-large only takes 0.25ms. Is my testing method wrong? This conclusion makes me frustrated, you can also try it and see if my conclusion is wrong.

Thank you for sharing your further testing results and observations. VanillaNet is designed with fewer layers, but each layer involves more complex computations. Therefore, it's better suited for scenarios where there's ample computational resources. In such cases, the primary latency bottleneck tends to be the number of layers rather than FLOPs, which is a key point we aimed to highlight in our work.

We generally set the batch size to 1 for our tests because when the batch size is larger, VanillaNet may not exhibit the advantages as seen in your tests. I suggest you try setting the batch size to 1 when running tests in TensorRT as well. This might better reflect the performance characteristics described in the paper.

rememberBr commented 6 months ago

Hello, I've been testing the speed of VanillaNet recently. I tried VanillaNet6 with an image size of 128_128, which is smaller than 224_224. Using tensorrt for inference, the speed of mobilenetv3-large is 0.028ms, while VanillaNet6 is 0.096ms. The gpu is A40. This is inconsistent with the results of directly using torch in the paper. It seems that VanillaNet is not good at fast processing in TRT?

Thank you for sharing your observations regarding the speed performance of VanillaNet when tested with TensorRT. It's important to note that changing the input resolution of the model can lead to variations in speed performance. We recommend conducting tests using an input image size of 224_224, as per the model's original design and the conditions under which it was benchmarked in our paper. If you're using an input size of 128_128, the model might require a redesign to optimize its performance for that specific resolution. The discrepancy you observed, especially when comparing VanillaNet6 with MobileNetV3-Large under TensorRT, might be attributed to this change in input size, which can significantly affect how the model processes data and, consequently, its inference speed. Hello, thanks for your reply. I tried using a model with size 224 and the speed matched the readme mentioned when setting the batch size to 64 - about 0.3ms (but Mobilenetv3-large only takes 0.1ms, emmmm). It seems that Mobilenet is faster when using TensorRT. I ran test_latency.py again for testing and found the conclusion consistent with the paper. However, when setting batch_size to 64 in the line of code "data_loader_val = create_loader(dataset_val, input_size=size, batch_size=1, is_training=False, use_prefetcher=False)" in test_latency.py, something unexpected happened: VanillaNet6 takes 0.8ms while Mobilenetv3-large only takes 0.25ms. Is my testing method wrong? This conclusion makes me frustrated, you can also try it and see if my conclusion is wrong.

Thank you for sharing your further testing results and observations. VanillaNet is designed with fewer layers, but each layer involves more complex computations. Therefore, it's better suited for scenarios where there's ample computational resources. In such cases, the primary latency bottleneck tends to be the number of layers rather than FLOPs, which is a key point we aimed to highlight in our work.

We generally set the batch size to 1 for our tests because when the batch size is larger, VanillaNet may not exhibit the advantages as seen in your tests. I suggest you try setting the batch size to 1 when running tests in TensorRT as well. This might better reflect the performance characteristics described in the paper.

Thank you. I see. VanillaNet is a very interesting job. May I ask whether VanillaNet will continue to develop in the future, such as VanillaNetV2 or VanillaNetplus, which can maintain its advantage even when batchsize>1? That will be exciting. Because batchsize>1 is often used in practical application scenarios, higher throughput can be obtained.