Closed geor-kasapidi closed 2 years ago
also, based on my experience, NCHW image tensors is slightly faster comparing to NHWC data layout. But be careful - MPSImage conversion to MPSGraphTensorData requires double tensor transposition. And these transpositions are better to perform as as separated graph - I've experienced a performance hit if I insert transpositions after placeholder.
Thanks for the tips! Turning on the level1 flag seems to have mysterious results (I tried it just on part 3 of the UNet to start):
run
of the level1-compiled executableThread 8: EXC_BAD_ACCESS (code=1, address=0x440404410c010880)
) in the level1-compiled executable's run
I can try to narrow down further and see if it works on some smaller subgraph, I guess.
I also do need to try NCHW - thanks for the suggestion! Checking my notes, I think I had assumed NHWC would be faster without ever verifying. I don't need to do any conversion to MPSImage AFAIK - but I will need to change the permutes used during self/cross-attention.
hi, @madebyollin ! in article you have asked about level1 optimisation flag - please, take a look at this code. I recommend you to compile your MPSGraph instances using suggested approach. Compiled version is usually faster :)