wuhuikai / FastFCN

FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation.
http://wuhuikai.me/FastFCNProject
Other
838 stars 148 forks source link

Test in real time #57

Closed gasparramoa closed 4 years ago

gasparramoa commented 4 years ago

Hello! First of all, thanks for your work.

I have created a new dataset with the same structure as the ADE20k dataset, I trained the model with 2 classes, test it and it works very well. The problem is that it takes more or less 4/5 seconds to predict/test one image using the test.py script.

This is only happening with me or it is normal to take "this" long time to predict one image? (My computer: RAM:16Gb, 256Gb SDD, GeForce GTX 1080ti, AMD Ryzen 7 2700)

Thanks in advance. Gaspar Ramôa

wuhuikai commented 4 years ago
  1. What's the resolution of the test image?
  2. What's the exact command of running test.py?
  3. How do you measure the running time?
gasparramoa commented 4 years ago

Thanks for your quick reply.

  1. The resolution is 480(width) x 640(height).
  2. The command is the following: "CUDA_VISIBLE_DEVICES=0 python test.py --dataset ade20kgaspar --model encnet --jpu --aux --se-loss --backbone resnet101 --resume '/home/socialab/human_vision/FastFCN/experiments/segmentation/runs/ade20kgaspar/encnet/encnet_res101_DDF/model_best.pth.tar' --split BETA --mode test --save-folder '/home/socialab/human_vision/BETA/SEG/'" (2). The ade20kgaspar is the new dataset that I created based on the ade20k dataset. (2). The split "BETA" is a split I created that just has one image.
  3. To measure the running time I create a timer as the script starts and end it as the script ends.
wuhuikai commented 4 years ago

Model loading takes a lot of time, which should not be included. Besides, the running time of the first image is longer than the others. You'd better measure the speed among several images.

gasparramoa commented 4 years ago

I asked this because I am building a portable system for visually impaired people that uses semantic segmentation algorithms to navigate indoor spaces. It must be real-time as you can see and that's why I am using only one image/frame.

I follow your suggestion and I was always loading the model and the checkpoint before each prediction. Now the time of each prediction, without the load of the model is more or less 0.4 seconds which is great!

Thanks for helping me.