inference speed - Githubissues

dingweichao123 commented 4 years ago

When I test, the inference speed isn't as fast as the paper says. It can only achieve 50fps with 1HG stack.

protossw512 commented 4 years ago

@dingweichao123 Test time speed heavily depends on your hardware configuration and your cuda/gpu driver version. Since I did not release the evaluation script for 1 HG model, may I ask how did you get your runtime speed?

dingweichao123 commented 4 years ago

@dingweichao123 Test time speed heavily depends on your hardware configuration and your cuda/gpu driver version. Since I did not release the evaluation script for 1 HG model, may I ask how did you get your runtime speed?

The runtime is evaluated on GTX 2080ti and cuda 10. I wrote a test code just for deployment according to the script you release . I set the num_modules as 1 and num_landmarks as 98.

protossw512 commented 4 years ago

@dingweichao123 Test time speed heavily depends on your hardware configuration and your cuda/gpu driver version. Since I did not release the evaluation script for 1 HG model, may I ask how did you get your runtime speed?

The runtime is evaluated on GTX 2080ti and cuda 10. I wrote a test code just for deployment according to the script you release . I set the num_modules as 1 and num_landmarks as 98.

There are probably some issues with your script. If you change the number of HGs to 1, and batch_size to 1 in my sample script and run it, at the end of the evaluation you will get something like this:

Everage runtime for a single batch: 0.009969

I tested this just now on a GTX 1080ti GPU with CUDA 10 and PyTorch 1.3.0. My original code was tested on PyTorch 0.4.1, and on another 1080ti machine which I do not have access right anymore. There might be some variations on the performance side, but 2080ti should be faster than 1080ti. 50 FPS on 2080ti does not make sense to me at all.

dingweichao123 commented 4 years ago

@dingweichao123 Test time speed heavily depends on your hardware configuration and your cuda/gpu driver version. Since I did not release the evaluation script for 1 HG model, may I ask how did you get your runtime speed?

The runtime is evaluated on GTX 2080ti and cuda 10. I wrote a test code just for deployment according to the script you release . I set the num_modules as 1 and num_landmarks as 98.

There are probably some issues with your script. If you change the number of HGs to 1, and batch_size to 1 in my sample script and run it, at the end of the evaluation you will get something like this:
Everage runtime for a single batch: 0.009969
I tested this just now on a GTX 1080ti GPU with CUDA 10 and PyTorch 1.3.0. My original code was tested on PyTorch 0.4.1, and on another 1080ti machine which I do not have access right anymore. There might be some variations on the performance side, but 2080ti should be faster than 1080ti. 50 FPS on 2080ti does not make sense to me at all.

Thx for reply, the test speed I mentioned includes pre-processing time and post-processing time, the inference speed can achieve 83fps. I'll check my script.

protossw512 commented 4 years ago

@dingweichao123 Test time speed heavily depends on your hardware configuration and your cuda/gpu driver version. Since I did not release the evaluation script for 1 HG model, may I ask how did you get your runtime speed?

The runtime is evaluated on GTX 2080ti and cuda 10. I wrote a test code just for deployment according to the script you release . I set the num_modules as 1 and num_landmarks as 98.

There are probably some issues with your script. If you change the number of HGs to 1, and batch_size to 1 in my sample script and run it, at the end of the evaluation you will get something like this:
Everage runtime for a single batch: 0.009969
I tested this just now on a GTX 1080ti GPU with CUDA 10 and PyTorch 1.3.0. My original code was tested on PyTorch 0.4.1, and on another 1080ti machine which I do not have access right anymore. There might be some variations on the performance side, but 2080ti should be faster than 1080ti. 50 FPS on 2080ti does not make sense to me at all.
Thx for reply, the test speed I mentioned includes pre-processing time and post-processing time, the inference speed can achieve 83fps. I'll check my script.

Runtime should be evaluated based on model forward only for comparison. In my opinion, data preprocessing and postprocessing may vary, and should be handled in parallel with the model inference.

dingweichao123 commented 4 years ago

@dingweichao123 Test time speed heavily depends on your hardware configuration and your cuda/gpu driver version. Since I did not release the evaluation script for 1 HG model, may I ask how did you get your runtime speed?

The runtime is evaluated on GTX 2080ti and cuda 10. I wrote a test code just for deployment according to the script you release . I set the num_modules as 1 and num_landmarks as 98.

There are probably some issues with your script. If you change the number of HGs to 1, and batch_size to 1 in my sample script and run it, at the end of the evaluation you will get something like this:
Everage runtime for a single batch: 0.009969
I tested this just now on a GTX 1080ti GPU with CUDA 10 and PyTorch 1.3.0. My original code was tested on PyTorch 0.4.1, and on another 1080ti machine which I do not have access right anymore. There might be some variations on the performance side, but 2080ti should be faster than 1080ti. 50 FPS on 2080ti does not make sense to me at all.
Thx for reply, the test speed I mentioned includes pre-processing time and post-processing time, the inference speed can achieve 83fps. I'll check my script.
Runtime should be evaluated based on model forward only for comparison. In my opinion, data preprocessing and postprocessing may vary, and should be handled in parallel with the model inference.

I think there are some variations on the performance side between different version of Pytorch. I got the runtime 11.5ms/pic on Pytorch 1.1 and 10.2ms/pic on Pytorch 1.2 , very close to yours.

austingg commented 4 years ago

@protossw512 will you release 1HG model?

protossw512 / AdaptiveWingLoss

inference speed #7