Inference backend - Githubissues

viig99 commented 4 years ago

hi @lunixbochs, have been following your work on wav2letter, and the trail of pointers and breadcrumbs you have left have been of great help to get it up and running, i now have a reasonably performant grpc service running that does streaming really well, and was wondering if you have done any other optimizations for improving inference latency. As per the issue, https://github.com/facebookresearch/wav2letter/issues/586, vineel mentioned that for loops would just work fine, have you tried playing around with https://pytorch.org/blog/tensor-comprehensions/ for the rest layers ? i am interested in picking this up, if you can divulge any information on what you felt were the non performant pieces it would of great help!

lunixbochs commented 4 years ago

can you be more specific? which models are you running on which code?

viig99 commented 4 years ago

The wav2letter streaming convnets model, on the wav2letter inference codebase, the multithreaded streaming example, which runs on the fbgemm cpu backend. Fbgemm has been built for a 80 core intel xeon gold machine using MKL DNN. I am getting 512 clients/sec at around (14ms + 10ms network latency) for parsing 500ms audio chunks, i had initially made a mistake in measuring my throughput and was getting slower speeds, and exploring ways to improve the overall performance, but this question no longer seems relevant, as the throughput is already quite high for my needs, I am closing this issue for now and will further try to measure the latency of the non-fbgemm backends layers like relu, layer norm, residual block performance, check if there is any point it optimizing it further and let you know how it went. Apologies for the trouble caused.

talonvoice / wav2letter

Inference backend #4