Closed FelixGoetze closed 3 years ago
Thank you for your contribution! Doing the multiplication on the GPU was something I wanted to do, but got distracted by other things... I didn't know about the inefficiency of using float16 on the CPU!
I'll merge the PR and will then do some small changes to the notebook (for example keeping the search output).
This changes the matrix multiplication in the
find_best_matches
function to use Pytorch. With an available CUDA GPU it can run more than 100 times faster. When there is no GPU available, the Float16 Numpy arrays are converted to Float32 tensors, which also runs faster, due to better hardware support on the CPU (see also).Below are the runtime performances for the following command:
Pytorch with Float16 on Colab with GPU:
Pytorch with Float32 on Colab with CPU:
Previous Numpy implementation using Float16: