Open jingtaozhan opened 3 years ago
The current public retriever implementation uses pytorch API calls, so technically it will take as little as adding a few .cuda()
calls to make it run on GPU. Optimizing it may take some efforts. I can make a patch but that could take some time as I am currently having quite a few things on my plate..
Thanks. I can implement it myself by just adding a few .cuda()
calls. But can I achieve the GPU latency reported in the paper in this way?
As I said, optimizing it could take some effort. Some considerations include keeping memory aligned and contiguous. GPU topk efficiency is also tricky. It is also likely to be hardware dependent.
I see. The original experimental implementation includes many optimization tricks.
I will try simply adding the .cuda()
calls and look forward to the your optimized GPU retrieval codes.
Thank you!
Thank you for sharing the codes. COIL achieves very impressive retrieval performance. I wonder how to use GPU for retrieval.