Open carsonswope opened 2 years ago
Hi, quantization support on GPUs is still maturing and improving our integer performance (and INT8/UINT8 in particular) is something that we're still working on. In particular our fast path quantized operators rely on a feature introduced in Shader Model 6.4, which isn't supported by all GPUs and drivers yet.
If you have a particular scenario in mind, we'd love to hear about your use case if you're comfortable sharing it. This'll help us figure out what to optimize for as we continue to work on our integer performance.
Hey, thanks for the quick response @adtsai.
My use case is to deploy ML models for video & image processing as part of plugins for popular video editing tools. DirectML is appealing because of compatibility across more than just NVIDIA GPUs as well as much smaller distributable size than the CuDNN+TensorRT libraries. However execution time is important, especially when working with 4k video. I've been able to achieve significant speedups using quantization w/ TensorRT, but, like I say, haven't been able to duplicate the speedup on DirectML, at least w/ the hardware that I have.
So, basically, I'm looking for fast execution of quantized convolution & matrix multiply operations. If you're curious about specifics, one model I'm working with right now is FastDVDNet, which is CNN-based video denoising. I'm also looking at some transformer-based image processing, such as Dense Prediction Transformers, for monocular depth extraction.
FYI: It seems like my GPU does support Shader Model 6.4 (https://www.techpowerup.com/gpu-specs/geforce-gtx-1080-ti.c2877). Is it at all possible to get some kind of log of the decision-making process of the DirectML 'compiler' for a given graph? It would be super helpful to have a little more insight into why it might be missing the fast path.
Thanks.. hope this is helpful.
I ran into the same situation. Is there any update here?
we are having the same problem, does DirectML now support int8 quantization after year later? thanks.
+1
Hi,
I'm running DirectML 1.9.0 w/ an NVIDIA GTX 1080ti GPU. I've been experimenting with the quantized operations provided by DirectML.
I have found that on my system, the
DML_QUANTIZED_LINEAR_CONVOLUTION_OPERATOR
andDML_CONVOLUTION_INTEGER_OPERATOR
perform about 10x slower than the normalDML_CONVOLUTION_OPERATOR
operator for equivalent convolution operations, even when any quantize/dequantize processing steps are removed. I know that my GPU offers some hardware support for int8 computation, because I'm able to run quantized models via TensorRT and see a speedup. However, clearly DirectML is not finding the 'fast' implementation.Is this expected behavior for my hardware? Should I expect to see a speedup running quantized operations on a newer NVIDIA GPU with more better hardware support for imma?
Thanks!
--
Looking at the NVIDIA hardware support table, my gpu (compute capability 6.1) supports
int8
but notint8 tensor core
, which is only available at the next generation.