Support for mixed precision

andres-ulloa commented 1 year ago

Have an rx 6900xt which runs inference on stable diffusion in 33s. My 16 GB rtx a4000 does the same in 6.7s. DirectML is not a serious alternative to neither of ROCm and CUDA, without support or emulation for tensor cores. AMD inference times are 6 times slower than the equivalent Nvidia card running CUDA. Even ROCm has massive gains on Radeon cards without any actual matrix cores.

Any chance the plugin gets real mixed precision support? What are your plans going forward with regards to performance?

Thanks in advance for taking your time to address these concerns.

aliencaocao commented 1 year ago

https://github.com/microsoft/tensorflow-directml-plugin/discussions/315#discussioncomment-3911959

PatriceVignola commented 1 year ago

Hi @andres-ulloa,

As @aliencaocao said, mixed precision is an area that we haven't been focusing on yet but it's on our radar. Is there a particular model that you're looking at?

cminnoy commented 6 months ago

mixed precision float16 on RDNA2 RX 6900 XT would be great for convolutions

microsoft / tensorflow-directml-plugin

Support for mixed precision #323