Open jb2020-super opened 3 years ago
Hi,
DirectML fuses operators opportunistically - that is, when it is both possible to fuse and there is a performance benefit to doing so. Unfortunately in this case it appears it wasn't possible to fuse the LEAKY_RELU with the metacommand (as the level of metacommand support can vary by hardware and driver version). You might be able to achieve the fusion by using the DISABLE_METACOMMANDS flag, but that's likely to result in worse performance. Let us know if you have an end-to-end scenario that's impacted by this - if there's data that shows a substantial performance difference, this is something we can raise with hardware vendors as a potential optimization in future.
Hi @adtsai , DISABLE_METACOMMANDS will result into bad performance. I replaced the model in the DirectMLSuperResolution sample with a seven-layer CNN and tested it.The results are as follows.
Model | AMD RX 5700 XT (frame time) | NVIDIA RTX 3090(frame time) |
---|---|---|
Demo | 38.41 ms | 10.975 ms |
7-layer CNN | 41.10 ms | 33.254 ms |
7-layer CNN(disable metacommand) | 133.69 ms | 115.50 ms |
Demo model on 5700XT
7-layer CNN on 5700XT
Demo model on 3090
7-layer CNN on 3090
7-layer CNN on 3090 disable metacommand
This is my code https://github.com/jb2020-super/test-DirectML.git
According to the PIX analysis result, the convolution with FusedActivation set to DML_OPERATOR_ACTIVATION_LEAKY_RELU is splitted into two convolution ops. But when replaced with DML_OPERATOR_ACTIVATION_RELU, fusion succeed. How to solve this?