Open shubhank008 opened 5 years ago
Yes, TPU would probably be much faster. If you really know what you're doing, you could try replacing my beam search implementation with one that happens entirely on the GPU, which could make it run much faster with no degradation in quality -- but that's beyond my knowhow. Otherwise I'd just recommend playing around with the inference options I've included -- for example, beam width of 1 and topn of 5 might be worth trying. That will degrade quality somewhat, but probably any deviation from the default options will degrade quality, because I picked the default options to maximize quality :)
On most basic AWS instance (CPU) it takes anywhere 8-10 seconds to complete a reply. On a GPU node (K10), its around 2-3 secs (honestly not worth it considering price difference in both instances).
Was wondering, is there a way or options to tweak to speed up the inference output ? Tried playing with Beam but not worth the dent it makes compared to the quality of response.
Still a beginner in ML, but would using a TPU make a difference in inference, compared to using GPU ?