hellochick / ICNet-tensorflow

TensorFlow-based implementation of "ICNet for Real-Time Semantic Segmentation on High-Resolution Images".
405 stars 153 forks source link

batch_size idea #52

Open aliericcantona opened 6 years ago

aliericcantona commented 6 years ago

Hi,

As of now, The time complexity for UHD images and 1080p images are as the following:

On UHD images, (e.g. images in a folder) the first one is slow (understandable) and the average execution time on the rest is around 0.71 sec. I have GeForce GTX 1080 and GeForce GTX 980. I disabled the memory growth flag though due to the 980 one.

The same experiment on the 1080p gives me 0.18 sec. Do you expect these numbers?

Is there anyway to send the images not one by one to the network (as the current code status). Something like batch_size of 5-8 for each call in the loop. It might speed up the network in case is doable. Let me know your thoughts. Thanks

hellochick commented 6 years ago

Can you tell me the resolution of your input images? 1024*2048 size image give me 0.04 sec on GTX1080. Yes, you can change the code to feed batch of images as input, and this might be much faster.

aliericcantona commented 6 years ago

On my machine with the gtx 1080 for an image of 1920x1080 is .18 sec ... hmm almost 4 times slower, isnthere any settings I need to do?

hellochick commented 6 years ago

Oh, my graphic card is gtx 1080 ti, but I don't think gtx 1080 will 4 times slower than it. Can you try with single input image with following code?

for i in range(10):
    start_time = time.time()
    preds = sess.run(pred, feed_dict={x: img})
    print(time.time() - start_time)
aliericcantona commented 6 years ago

Yes that’s the way I outputed as well

aliericcantona commented 6 years ago

GPU: gtx 1080 (not a TI) Tensorflow: (r1.6 from the source) Libcuda: 8.0 Libcnn: 5.0 gpu decide version: 6.1 python: 2.7 Even I played with the blaze build option to re-compile tensorflow but still I don't get 0.04 sec as your machine. still around 0.17-0.18 second per frame 1920x1080...

hellochick commented 6 years ago

@aliericcantona , when I install r1.6, it recommended cuda 9.0, I don't know whether this is a problem or not. However, I use tf 1.4 instead of tf 1.6, maybe you can try on tf 1.4? I think 0.17 is really slow for gtx 1080, really strange.

aliericcantona commented 6 years ago

still I can't get less than 0.16 sec. Even I have the new image on my centos 7 machine. Is there any trick (OS) wise that you get that number? 4 times faster than mine.

aliericcantona commented 6 years ago

BTW, I installed cuda 9.1 and cudnn 7.0 with tensorflow r1.6 on gpu 1080ti, stil the same number. 0.16seconds per frame of 1920x1080 size. I installed tensorflow from the source. Is there any special trick you may know of when ./configure?

aliericcantona commented 6 years ago

Can you list your machine installed packages list, mine is as the following:

  1. protobuf == 3.5.2
  2. python 2.7.5
  3. gcc 4.8.5
  4. nvidia cuda 9.0
  5. nvidia cudann 7.0
  6. protobuf
  7. OS (Centos 7)