patlevin / tfjs-to-tf

A TensorFlow.js Graph Model Converter
MIT License
139 stars 18 forks source link

Performance Drop #25

Closed miknyko closed 4 years ago

miknyko commented 4 years ago

Hi there

I managed to convert the BodyPix model from tfjs model to SavedModel, but i observed some obvious performance drop (AP DROP on segmentation)on the python model. I tried several bodypix models in different scaling, all behaved worse than TFJS model. Any idea? Many thanks

patlevin commented 4 years ago

Hello,

without looking further into this, my suspicion is that the Python version runs on the CPU, whereas the TFJS model uses the GPU. Running the demo reveals that the TFJS version hardly uses any CPU and hits the GPU instead: image-2020-10-14-183625.png.

I'll do some more testing, but I think I might be able to add a flag to select which target hardware to run the model on.

patlevin commented 4 years ago

I tested all available models in both Python and JavaScript and there is no notable performance difference if the GPU is used.

In fact, the perceived performance drop is most likely due to one of two factors:

  1. the Python version might use the CPU instead of the GPU (the PIP version only supports NVIDIA cards and CUDA, whereas tensorflowjs can use any GPU - Intel iGPU, AMD APU, and AMD/NVIDIA GPU via their WebGL-backend)
  2. the Python version most likely runs the model using the full input image resolution, whereas the JavaScript demo defaults low/medium resolutions
  3. The first run will be significantly slower than subsequent inference calls, since the kernels need to be compiled and the model needs to be uploaded to the GPU.

Point 2 is especially interesting because you can test it yourself - open the demo and notice the difference in performance when switching from "medium" to "full" resolution:

image-2020-10-15-232049.png

"Full" resolution:

image-2020-10-15-232155.png

Using the full internal image resolution resulted in a massive performance drop. So unless your Python code also uses scaled-down images for inference, a big performance drop is to be expected.

Note that these run on the iGPU of my processor, because that's the default device my browser uses. Running the model on full resolution in the browser on the dedicated GPU yields about 10 fps on my test machine. The converted model did the inference on the same GPU at an average of 46 fps (measured over 100 frames). Granted, the Python test code didn't include calculating and rendering the segmentation overlay, so there's some space for performance loss there.

The converted models, however, are not tied to any particular physical device type and run on the GPU by default, provided a supported GPU and all required support libraries (CUDA 10.1 and cuDNN 7.6 as of TF 2.x) are present in the system. I did not observe any performance degradation compared to the browser version.